linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2015-12-23	cxgb4: Update correct encoding of SGE Ingress DMA States for T6 adapter	Hariprasad Shenai
	Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	cxgb4: Update Congestion Channel map for T6 adapter	Hariprasad Shenai
	Updating Congestion Channel/Priority Map in Congestion Manager Context for T6. In T6 port 0 is mapped to channel 0 and port 1 is mapped to channel 1. For 2 port T4/T5 adapter, port 0 is mapped to channel 0,1 and port 1 is mapped to channel 2,3 Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	cxgb4: Update register range and SGE registers for T6 adapter	Hariprasad Shenai
	Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	cxgb4/cxgb4vf: Update Ingress padding boundary values for T6 adapter	Hariprasad Shenai
	Ingress padding boundary values got changed for T6. T5: 0=32B 1=64B 2=128B 3=256B 4=512B 5=1024B 6=2048B 7=4096B T6: 0=8B 1=16B 2=32B 3=64B 4=128B 5=128B 6=256B 7=512B Updating the driver to set the correct boundary values in SGE_CONTROL to 32B. Also, need to take care of this fl alignment change when calculating the next packet offset. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	cxgb4: Update pm_stats for T6 adapter family	Hariprasad Shenai
	Updated pm_stats code to display input FIFO wait (index 5) and read latency (index 7) counters for T6 adapters Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	cxgb4: Pass correct argument to t4_link_l1cfg()	Hariprasad Shenai
	Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	bridge: use kobj_to_dev instead of to_dev	Geliang Tang
	kobj_to_dev has been defined in linux/device.h, so I replace to_dev with it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	bonding: drop unused to_dev macro in bond_sysfs.c	Geliang Tang
	to_dev is not used anymore so drop it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	dsa: mv88e6xxx: Add Second back of statistics	Andrew Lunn
	The 6320 family of switch chips has a second bank for statistics, but is missing three statistics in the port registers. Generalise and extend the code: * adding a field to the statistics table indicating the bank/register set where each statistics is. * add a function indicating if an individual statistics is available on this device * calculate at run time the sset_count. * return strings based on the available statistics of the device * return statistics based on the available statistics of the device * Add support for reading from the second bank. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	Merge branch 'sfc-vf'	David S. Miller
	Bert Kenward says: ==================== sfc: additional virtual function support This introduces the client side of a mechanism to defer authorisation of operations, for example multicast subscription. Although primarily aimed at SRIOV VFs this can also apply to unprivileged PFs. Also handle reboot ordering corner cases better and reduce the level of some logging. v2: remove #ifdef DEBUG around new WARN_ON in mcdi.c. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	sfc: Downgrade or remove some error messages	Bert Kenward
	Depending on configuration the NIC may return errors for unprivileged functions and/or VFs. Where these are expected and handled, reduce the level of any output. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	sfc: Downgrade EPERM messages from MCDI to debug	Tomáš Pilař
	When running in an unprivileged function we expect some MC commands to fail with permission errors. To avoid log spew downgrade these to debug only. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	sfc: Make failed filter removal less noisy	Bert Kenward
	There are situations - mostly reset related - where our view of the filter table differs from the hardware. In this case we may try and remove filters that aren't actually installed. This isn't that interesting in most situations, so downgrade the logging. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	sfc: Handle MCDI proxy authorisation	Bert Kenward
	For unprivileged functions operations can be authorised by an admin function. Extra steps are introduced to the MCDI protocol in this situation - the initial response from the MCDI tells us that the operation has been deferred, and we must retry when told. We then receive an event telling us to retry. Note that this provides only the functionality for the unprivileged functions, not the handling of the administrative side. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	sfc: Retry MCDI after NO_EVB_PORT error on a VF	Bert Kenward
	After reboot the vswitch configuration from the PF may not be complete before the VF attempts to restore filters. In that case we see NO_EVB_PORT errors from the MC. Retry up to a time limit or until a different result is seen. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	Merge branch 'cxgb4-next'	David S. Miller
	Hariprasad Shenai says: ==================== Trivial enhancements for cxgb4 This series adds a debug message if adapter isn't inserted in right PCI slot. Changes naming conventions for iSCSI rx queues, use node info while allocating rx queue and use napi_complete_done() api in napi handler. This patch series has been created against net-next tree and includes patches on cxgb4 driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. Thanks V2: Dropped 'dcb_info' debug entry patch, since the same can be achieved using lldp tool. Based on review comments by Or Gerlitz <gerlitz.or@gmail.com> and David Miller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	cxgb4: Use napi_complete_done() api in napi handler	Hariprasad Shenai
	Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	cxgb4: Use the node info to alloc_ring() for RX queues	Hariprasad Shenai
	Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	cxgb4: get naming correct for iscsi queues	Hariprasad Shenai
	All the upper level protocols like rdma, iscsi have their own offload rx queues, so instead of using the generic naming convention be specific while naming them. Improves code readability Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23	cxgb4: Warn if device doesn't have enough PCI bandwidth	Hariprasad Shenai
	Check if the device get enough bandwidth from the entire PCI chain to satisfy its capabilities. This patch determines the PCIe device's bandwidth capabilities by reading its PCIe Link Capabilities registers and then call the pcie_get_minimum_link function to ensure that the adapter is hooked into a slot which is capable of providing the necessary bandwidth capabilities. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	Merge branch 'bindtodevice_tw_rst'	David S. Miller
	Florian Westphal says: ==================== tcp: honour SO_BINDTODEVICE for TW_RST case too This is V2, this time as a small series since I followed Erics advice to split this into smaller chunks, I hope this makes it easier to review. First patch adds inet_sk_transparent helper. Second patch contains an if/else swap that I split from the original TW_RST v1 one. Third patch is the actual change without the superfluous sock_net change. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	tcp: honour SO_BINDTODEVICE for TW_RST case too	Florian Westphal
	Hannes points out that when we generate tcp reset for timewait sockets we pretend we found no socket and pass NULL sk to tcp_vX_send_reset(). Make it cope with inet tw sockets and then provide tw sk. This makes RSTs appear on correct interface when SO_BINDTODEVICE is used. Packetdrill test case: // want default route to be used, we rely on BINDTODEVICE `ip route del 192.0.2.0/24 via 192.168.0.2 dev tun0` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 // test case still works due to BINDTODEVICE 0.001 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "tun0", 4) = 0 0.100...0.200 connect(3, ..., ...) = 0 0.100 > S 0:0(0) <mss 1460,sackOK,nop,nop> 0.200 < S. 0:0(0) ack 1 win 32792 <mss 1460,sackOK,nop,nop> 0.200 > . 1:1(0) ack 1 0.210 close(3) = 0 0.210 > F. 1:1(0) ack 1 win 29200 0.300 < . 1:1(0) ack 2 win 46 // more data while in FIN_WAIT2, expect RST 1.300 < P. 1:1001(1000) ack 1 win 46 // fails without this change -- default route is used 1.301 > R 1:1(0) win 0 Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	tcp: send_reset: test for non-NULL sk first	Florian Westphal
	tcp_md5_do_lookup requires a full socket, so once we extend _send_reset() to also accept timewait socket we would have to change if (!sk && hash_location) to something like if ((!sk \|\| !sk_fullsock(sk)) && hash_location) { ... } else { (sk && sk_fullsock(sk)) tcp_md5_do_lookup() } Switch the two branches: check if we have a socket first, then fall back to a listener lookup if we saw a md5 option (hash_location). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	net: add inet_sk_transparent() helper	Florian Westphal
	Avoids cluttering tcp_v4_send_reset when followup patch extends it to deal with timewait sockets. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	mlxsw: core: Use devm_kzalloc to allocate mlxsw_hwmon structure	Jiri Pirko
	KASan reported use-after-free for the hwmon structure. So fix this by using devm_kzalloc and let the core take care about freeing the memory during device dettach. Reported-by: Ido Schimmel <idosch@mellanox.com> Fixes: 89309da39 ("mlxsw: core: Implement temperature hwmon interface") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	net: tcp: deal with listen sockets properly in tcp_abort.	Lorenzo Colitti
	When closing a listen socket, tcp_abort currently calls tcp_done without clearing the request queue. If the socket has a child socket that is established but not yet accepted, the child socket is then left without a parent, causing a leak. Fix this by setting the socket state to TCP_CLOSE and calling inet_csk_listen_stop with the socket lock held, like tcp_close does. Tested using net_test. With this patch, calling SOCK_DESTROY on a listen socket that has an established but not yet accepted child socket results in the parent and the child being closed, such that they no longer appear in sock_diag dumps. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	mlxsw: core: Allow to reset temperature history via hwmon interface	Jiri Pirko
	Add another sysfs hwmon attribute to expose possibility to reset temperature sensors history. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	RDS: don't pretend to use cpu notifiers	Sebastian Andrzej Siewior
	It looks like an attempt to use CPU notifier here which was never completed. Nobody tried to wire it up completely since 2k9. So I unwind this code and get rid of everything not required. Oh look! 19 lines were removed while code still does the same thing. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	net-sysfs: use to_net_dev in net_namespace()	Geliang Tang
	Use to_net_dev() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	Merge branch '100GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2015-12-22 This series contains updates to fm10k only. Bruce cleans up the initialization of fm10k_workqueue at the global level, which fixes a checkpatch.pl error. Made several other cleanups of the driver, like making structures that do not change constant, remove unused code, cleanup code comments and use boolean states true/false instead of an integer since a bool is all that is needed. Jacob fixed the TLV format for little endian structures which are 4 byte aligned copy, so add an additional __aligned(4) and __packed to ensure that these structures are actually 4 byte aligned and packed correctly. Updated the driver to use ether_addr_equal() instead of memcmp() to compare MAC addresses. Alex Duyck cleans up the exception handling so all of the paths result in a similar state if we fail. Specifically the driver will now unload the mailbox interrupt, free the queue vectors and MSI-X, and then detach the interface. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22	fm10k: IS_ENABLED() is not appropriate for boolean kconfig option	Bruce Allan
	Tri-states need 'if IS_ENABLED()', booleans should use 'ifdef'. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-22	fm10k: cleanup mailbox code comments etc	Bruce Allan
	Cleanup a number of issues with function header comments, lower-case acronyms (i.e. FIFO, TLV), duplicate comments and a stubbed-out header comment for fm10k_sm_mbx_init. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-22	fm10k: use true/false for boolean get_host_state	Bruce Allan
	Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-22	fm10k: remove unused struct element	Bruce Allan
	Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-22	fm10k: constify fm10k_mac_ops, fm10k_iov_ops and fm10k_info structures	Bruce Allan
	These structures never change so declare them as const. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-22	fm10k: address operator not needed when declaring function pointers	Bruce Allan
	Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-22	fm10k: use ether_addr_equal instead of memcmp	Jacob Keller
	When comparing MAC addresses, use ether_addr_equal instead of memcmp to ETH_ALEN length. Found and replaced using the following sed: sed -e 's/memcmp\x28\(.*\), ETH_ALEN\x29/!ether_addr_equal\x28\1\x29/' Reported-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-22	fm10k: Cleanup exception handling for changing queues	Alexander Duyck
	This patch is meant to cleanup the exception handling for the paths where we reset the interrupts and then reconfigure them. In all of these paths we had very different levels of exception handling. I have updated the driver so that all of the paths should result in a similar state if we fail. Specifically the driver will now unload the mailbox interrupt, free the queue vectors and MSI-X, and then detach the interface. In addition for any of the PCIe related resets I have added a check with the hw_ready function to just make sure the registers are in a readable state prior to reopening the interface. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-22	fm10k: correctly pack TLV structures and explain reasoning	Jacob Keller
	The TLV format for little endian structures is actually 4 byte aligned copy. To this end, we need to add an additional __aligned(4) marker along with __packed to ensure that these structures are actually 4 byte aligned and packed correctly. Use of just __packed will not work as this will result in 1byte alignment which is incorrect. Add a comment explaining the reasoning behind why these structures need the special treatment. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-22	fm10k: don't initialize fm10k_workqueue at global level	Bruce Allan
	Cleans up checkpatch GLOBAL_INITIALIZERS error Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-12-20	ibmveth: consolidate kmalloc of array, memset 0 to kcalloc	Nicholas Mc Guire
	This is an API consolidation only. The use of kmalloc + memset to 0 is equivalent to kcalloc in this case as it is allocating an array of elements. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-19	netcp: fix regression in receive processing	Arnd Bergmann
	A cleanup patch I did was unfortunately wrong and introduced multiple serious bugs in the netcp rx processing, as indicated by these correct gcc warnings: drivers/net/ethernet/ti/netcp_core.c:776:14: warning: 'buf_ptr' may be used uninitialized in this function [-Wuninitialized] drivers/net/ethernet/ti/netcp_core.c:687:14: warning: 'ptr' may be used uninitialized in this function [-Wuninitialized] I have checked the patch once more and found that a call to get_pkt_info() accidentally got removed in netcp_free_rx_desc_chain, and netcp_process_one_rx_packet no longer retrieved the correct buffer length. This patch should fix all the known problems, but I did not test on real hardware. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 899077791403 ("netcp: try to reduce type confusion in descriptors") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18	asix: silence log message from oversize packet	stephen hemminger
	Since it is possible for an external system to send oversize packets at anytime, it is best for driver not to print a message and spam the log (potential external DoS). Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=109471 Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18	tcp: diag: add support for request sockets to tcp_abort()	Eric Dumazet
	Adding support for SYN_RECV request sockets to tcp_abort() is quite easy after our tcp listener rewrite. Note that we also need to better handle listeners, or we might leak not yet accepted children, because of a missing inet_csk_listen_stop() call. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Tested-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18	Merge branch 'bpf-misc-updates'	David S. Miller
	Daniel Borkmann says: ==================== Misc BPF updates This series contains a couple of misc updates to the BPF code, besides others a new helper bpf_skb_load_bytes(), moving clearing of A/X to the classic converter, etc. Please see individual patches for details. Thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18	bpf, test: add couple of test cases	Daniel Borkmann
	Add couple of test cases for interpreter but also JITs, f.e. to test that when imm32 moves are being done, upper 32bits of the regs are being zero extended. Without JIT: [...] [ 1114.129301] test_bpf: #43 MOV REG64 jited:0 128 PASS [ 1114.130626] test_bpf: #44 MOV REG32 jited:0 139 PASS [ 1114.132055] test_bpf: #45 LD IMM64 jited:0 124 PASS [...] With JIT (generated code can as usual be nicely verified with the help of bpf_jit_disasm tool): [...] [ 1062.726782] test_bpf: #43 MOV REG64 jited:1 6 PASS [ 1062.726890] test_bpf: #44 MOV REG32 jited:1 6 PASS [ 1062.726993] test_bpf: #45 LD IMM64 jited:1 6 PASS [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18	bpf, x86: detect/optimize loading 0 immediates	Daniel Borkmann
	When sometimes structs or variables need to be initialized/'memset' to 0 in an eBPF C program, the x86 BPF JIT converts this to use immediates. We can however save a couple of bytes (f.e. even up to 7 bytes on a single emmission of BPF_LD \| BPF_IMM \| BPF_DW) in the image by detecting such case and use xor on the dst register instead. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18	bpf: fix misleading comment in bpf_convert_filter	Daniel Borkmann
	Comment says "User BPF's register A is mapped to our BPF register 6", which is actually wrong as the mapping is on register 0. This can already be inferred from the code itself. So just remove it before someone makes assumptions based on that. Only code tells truth. ;) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18	bpf: move clearing of A/X into classic to eBPF migration prologue	Daniel Borkmann
	Back in the days where eBPF (or back then "internal BPF" ;->) was not exposed to user space, and only the classic BPF programs internally translated into eBPF programs, we missed the fact that for classic BPF A and X needed to be cleared. It was fixed back then via 83d5b7ef99c9 ("net: filter: initialize A and X registers"), and thus classic BPF specifics were added to the eBPF interpreter core to work around it. This added some confusion for JIT developers later on that take the eBPF interpreter code as an example for deriving their JIT. F.e. in f75298f5c3fe ("s390/bpf: clear correct BPF accumulator register"), at least X could leak stack memory. Furthermore, since this is only needed for classic BPF translations and not for eBPF (verifier takes care that read access to regs cannot be done uninitialized), more complexity is added to JITs as they need to determine whether they deal with migrations or native eBPF where they can just omit clearing A/X in their prologue and thus reduce image size a bit, see f.e. cde66c2d88da ("s390/bpf: Only clear A and X for converted BPF programs"). In other cases (x86, arm64), A and X is being cleared in the prologue also for eBPF case, which is unnecessary. Lets move this into the BPF migration in bpf_convert_filter() where it actually belongs as long as the number of eBPF JITs are still few. It can thus be done generically; allowing us to remove the quirk from __bpf_prog_run() and to slightly reduce JIT image size in case of eBPF, while reducing code duplication on this matter in current(/future) eBPF JITs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Zi Shen Lim <zlim.lnx@gmail.com> Cc: Yang Shi <yang.shi@linaro.org> Acked-by: Yang Shi <yang.shi@linaro.org> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18	bpf: add bpf_skb_load_bytes helper	Daniel Borkmann
	When hacking tc programs with eBPF, one of the issues that come up from time to time is to load addresses from headers. In eBPF as in classic BPF, we have BPF_LD \| BPF_ABS \| BPF_{B,H,W} instructions that extract a byte, half-word or word out of the skb data though helpers such as bpf_load_pointer() (interpreter case). F.e. extracting a whole IPv6 address could possibly look like ... union v6addr { struct { __u32 p1; __u32 p2; __u32 p3; __u32 p4; }; __u8 addr[16]; }; [...] a.p1 = htonl(load_word(skb, off)); a.p2 = htonl(load_word(skb, off + 4)); a.p3 = htonl(load_word(skb, off + 8)); a.p4 = htonl(load_word(skb, off + 12)); [...] /* access to a.addr[...] */ This work adds a complementary helper bpf_skb_load_bytes() (we also have bpf_skb_store_bytes()) as an alternative where the same call would look like from an eBPF program: ret = bpf_skb_load_bytes(skb, off, addr, sizeof(addr)); Same verifier restrictions apply as in ffeedafbf023 ("bpf: introduce current->pid, tgid, uid, gid, comm accessors") case, where stack memory access needs to be statically verified and thus guaranteed to be initialized in first use (otherwise verifier cannot tell whether a subsequent access to it is valid or not as it's runtime dependent). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>