Age | Commit message (Collapse) | Author |
|
It's not used anywhere.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be
unique for device and may even be unique for a sub-set of ports within
device, so add switchdev helper function to generate unique marks based on
port's switch ID and group_ifindex. group_ifindex would typically be the
container dev's ifindex, such as the bridge's ifindex.
The generator uses a global hash table to store offload_fwd_marks hashed by
{switch ID, group_ifindex} key.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Just before queuing skb for xmit on port, check if skb has been marked by
switchdev port driver as already fordwarded by device. If so, drop skb. A
non-zero skb->offload_fwd_mark field is set by the switchdev port
driver/device on ingress to indicate the skb has already been forwarded by
the device to egress ports with matching dev->skb_mark. The switchdev port
driver would assign a non-zero dev->offload_skb_mark for each device port
netdev during registration, for example.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It would be very useful to retrieve the net_cls's classid from an eBPF
program to allow for a more fine-grained classification, it could be
directly used or in conjunction with additional policies. I.e. docker,
but also tooling such as cgexec, can easily run applications via net_cls
cgroups:
cgcreate -g net_cls:/foo
echo 42 > foo/net_cls.classid
cgexec -g net_cls:foo <prog>
Thus, their respecitve classid cookie of foo can then be looked up on
the egress path to apply further policies. The helper is desigend such
that a non-zero value returns the cgroup id.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Split out retrieving the cgroups net_cls classid retrieval into its
own function, so that it can be reused later on from other parts of
the traffic control subsystem. If there's no skb->sk, then the small
helper returns 0 as well, which in cls_cgroup terms means 'could not
classify'.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch introduces the proto_down flag that can be used by user space
applications to notify switch drivers that errors have been detected on the
device.
The switch driver can react to protodown notification by doing a phys down
on the associated switch port.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
net/bridge/br_mdb.c
Minor conflict in br_mdb.c, in 'net' we added a memset of the
on-stack 'ip' variable whereas in 'net-next' we assign a new
member 'vid'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Until now all user mdb entries were added in vlan 0, this patch adds
support to allow the user to specify the vlan for the entry.
About the uapi change a hole in struct br_mdb_entry is used so the size
and offsets are kept the same (verified with pahole and tested with older
iproute2).
Example:
$ bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent vlan 2000
dev br0 port eth1 grp 239.0.0.1 permanent vlan 200
dev br0 port eth1 grp 239.0.0.1 permanent
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Missing list head init in bluetooth hidp session creation, from Tedd
Ho-Jeong An.
2) Don't leak SKB in bridge netfilter error paths, from Florian
Westphal.
3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien
Grall.
4) Fix regression in IP over hamradio bpq encapsulation, from Ralf
Baechle.
5) Fix race between rhashtable resize events and table walks, from Phil
Sutter.
6) Missing validation of IFLA_VF_INFO netlink attributes, fix from
Daniel Borkmann.
7) Missing security layer socket state initialization in tipc code,
from Stephen Smalley.
8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from
Denys Vlasenko.
9) Missing minor_idr destroy on module unload on macvtap driver, from
Johannes Thumshirn.
10) Various pktgen kernel thread races, from Oleg Nesterov.
11) Fix races that can cause packets to be processed in the backlog even
after a device attached to that SKB has been fully unregistered.
From Julian Anastasov.
12) bcmgenet driver doesn't account packet drops vs. errors properly,
fix from Petri Gynther.
13) Array index validation and off by one fix in DSA layer from Florian
Fainelli
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits)
can: replace timestamp as unique skb attribute
ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux
can: c_can: Fix default pinmux glitch at init
can: rcar_can: unify error messages
can: rcar_can: print request_irq() error code
can: rcar_can: fix typo in error message
can: rcar_can: print signed IRQ #
can: rcar_can: fix IRQ check
net: dsa: Fix off-by-one in switch address parsing
net: dsa: Test array index before use
net: switchdev: don't abort unsupported operations
net: bcmgenet: fix accounting of packet drops vs errors
cdc_ncm: update specs URL
Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html
net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets
bridge: mdb: allow the user to delete mdb entry if there's a querier
net: call rcu_read_lock early in process_backlog
net: do not process device backlog during unregistration
bridge: fix potential crash in __netdev_pick_tx()
net: axienet: Fix devm_ioremap_resource return value check
...
|
|
Commit 514ac99c64b "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245eb8 "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.
This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.
This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.
Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"This update from the timer departement contains:
- A series of patches which address a shortcoming in the tick
broadcast code.
If the broadcast device is not available or an hrtimer emulated
broadcast device, some of the original assumptions lead to boot
failures. I rather plugged all of the corner cases instead of only
addressing the issue reported, so the change got a little larger.
Has been extensivly tested on x86 and arm.
- Get rid of the last holdouts using do_posix_clock_monotonic_gettime()
- A regression fix for the imx clocksource driver
- An update to the new state callbacks mechanism for clockevents.
This is required to simplify the conversion, which will take place
in 4.3"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/broadcast: Prevent NULL pointer dereference
time: Get rid of do_posix_clock_monotonic_gettime
cris: Replace do_posix_clock_monotonic_gettime()
tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
tick/broadcast: Handle spurious interrupts gracefully
tick/broadcast: Check for hrtimer broadcast active early
tick/broadcast: Return busy when IPI is pending
tick/broadcast: Return busy if periodic mode and hrtimer broadcast
tick/broadcast: Move the check for periodic mode inside state handling
tick/broadcast: Prevent deep idle if no broadcast device available
tick/broadcast: Make idle check independent from mode and config
tick/broadcast: Sanity check the shutdown of the local clock_event
tick/broadcast: Prevent hrtimer recursion
clockevents: Allow set-state callbacks to be optional
clocksource/imx: Define clocksource for mx27
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for a cpu hotplug race vs. interrupt descriptors:
Prevent irq setup/teardown across the cpu starting/dying parts of cpu
hotplug so that the starting/dying cpu has a stable view of the
descriptor space. This has been an issue for all architectures in the
cpu dying phase, where interrupts are migrated away from the dying
cpu. In the starting phase its mostly a x86 issue vs the vector space
update"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hotplug: Prevent alloc/free of irq descriptors during cpu up/down
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm
Pull libnvdimm fixes from Dan Williams:
"1) Fixes for a handful of smatch reports (Thanks Dan C.!) and minor
bug fixes (patches 1-6)
2) Correctness fixes to the BLK-mode nvdimm driver (patches 7-10).
Granted these are slightly large for a -rc update. They have been
out for review in one form or another since the end of May and were
deferred from the merge window while we settled on the "PMEM API"
for the PMEM-mode nvdimm driver (ie memremap_pmem, memcpy_to_pmem,
and wmb_pmem).
Now that those apis are merged we implement them in the BLK driver
to guarantee that mmio aperture moves stay ordered with respect to
incoming read/write requests, and that writes are flushed through
those mmio-windows and platform-buffers to be persistent on media.
These pass the sub-system unit tests with the updates to
tools/testing/nvdimm, and have received a successful build-report from
the kbuild robot (468 configs).
With acks from Rafael for the touches to drivers/acpi/"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm:
nfit: add support for NVDIMM "latch" flag
nfit: update block I/O path to use PMEM API
tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
tools/testing/nvdimm: fix return code for unimplemented commands
tools/testing/nvdimm: mock ioremap_wt
pmem: add maintainer for include/linux/pmem.h
nfit: fix smatch "use after null check" report
nvdimm: Fix return value of nvdimm_bus_init() if class_create() fails
libnvdimm: smatch cleanups in __nd_ioctl
sparse: fix misplaced __pmem definition
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Kevin Hilman:
"A fairly random colletion of fixes based on -rc1 for OMAP, sunxi and
prima2 as well as a few arm64-specific DT fixes.
This series also includes a late to support a new Allwinner (sunxi)
SoC, but since it's rather simple and isolated to the
platform-specific code, it's included it for this -rc"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: dts: add device tree for ARM SMM-A53x2 on LogicTile Express 20MG
arm: dts: vexpress: add missing CCI PMU device node to TC2
arm: dts: vexpress: describe all PMUs in TC2 dts
GICv3: Add ITS entry to THUNDER dts
arm64: dts: Add poweroff button device node for APM X-Gene platform
ARM: dts: am4372.dtsi: disable rfbi
ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2
ARM: dts: am4372: Add emif node
Revert "ARM: dts: am335x-boneblack: disable RTC-only sleep"
ARM: sunxi: Enable simplefb in the defconfig
ARM: Remove deprecated symbol from defconfig files
ARM: sunxi: Add Machine support for A33
ARM: sunxi: Introduce Allwinner H3 support
Documentation: sunxi: Update Allwinner SoC documentation
ARM: prima2: move to use REGMAP APIs for rtciobrg
ARM: dts: atlas7: add pinctrl and gpio descriptions
ARM: OMAP2+: Remove unnessary return statement from the void function, omap2_show_dma_caps
memory: omap-gpmc: Fix parsing of devices
|
|
Pass the mdix setting from ethtool down to the phy driver, to allow
driver specific implementations of manually setting the polarity.
Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.
This add the ip_nonlocal_bind sysctl under ipv6.
Testing:
Set up nonlocal binding and receive routing on a host, e.g.:
ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1
Set up routing to 2001:0:0:1::/64 on peer to go to first host
ping6 -I 2001:0:0:1::1 peer-address -- to verify
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inet_twsk_deschedule() calls are followed by inet_twsk_put().
Only particular case is in inet_twsk_purge() but there is no point
to defer the inet_twsk_put() after re-enabling BH.
Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put()
and move the inet_twsk_put() inside.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
timewait sockets have a complex refcounting logic.
Once we realize it should be similar to established and
syn_recv sockets, we can use sk_nulls_del_node_init_rcu()
and remove inet_twsk_unhash()
In particular, deferred inet_twsk_put() added in commit
13475a30b66cd ("tcp: connect() race with timewait reuse")
looks unecessary : When removing a timewait socket from
ehash or bhash, caller must own a reference on the socket
anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kernel will crash the same if one of the pointer is NULL anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hop was always either 0 or sizeof(struct ipv6hdr).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
NCM specs are not actually mandating a specific position in the frame for
the NDP (Network Datagram Pointer). However, some Huawei devices will
ignore our aggregates if it is not placed after the datagrams it points
to. Add support for doing just this, in a per-device configurable way.
While at it, update NCM subdrivers, disabling this functionality in all of
them, except in huawei_cdc_ncm where it is enabled instead.
We aren't making any distinction between different Huawei NCM devices,
based on what the vendor driver does. Standard NCM devices are left
unaffected: if they are compliant, they should be always usable, still
stay on the safe side.
This change has been tested and working with a Huawei E3131 device (which
works regardless of NDP position), a Huawei E3531 (also working both
ways) and an E3372 (which mandates NDP to be after indexed datagrams).
V1->V2:
- corrected wrong NDP acronym definition
- fixed possible NULL pointer dereference
- patch cleanup
V2->V3:
- Properly account for the NDP size when writing new packets to SKB
Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the original design slow start is only used to raise cwnd
when cwnd is stricly below ssthresh. It makes little sense
to slow start when cwnd == ssthresh: especially
when hystart has set ssthresh in the initial ramp, or after
recovery when cwnd resets to ssthresh. Not doing so will
also help reduce the buffer bloat slightly.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a helper to test the slow start condition in various congestion
control modules and other places. This is to prepare a slight improvement
in policy as to exactly when to slow start.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"There is a fix for CephFS and RBD when used within containers/namespaces,
and a fix for the address learning the client is supposed to do when
initially talking to the Ceph cluster.
There are also two patches updating MAINTAINERS. One breaks out the
common Ceph code shared by fs/ceph and drivers/block/rbd.c into a
separate entry with the appropriate maintainers listed. The second
adds a second reference to the github tree where the Ceph client
development takes place (before it is pushed to korg and then to you).
The goal here is to move closer to a situation where Ilya Dryomov or
one of the other maintainers can push things to you if I am
unavailable. Ilya has done most of the work preparing branches for
upstream recently; you should not be surprised to hear from him if I
am trapped in some internet-less wasteland or hit by a bus or
something. In the meantime, we'll work on getting him added to the
kernel web of trust"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
MAINTAINERS: add secondary tree for ceph modules
MAINTAINERS: update ceph entries
libceph: treat sockaddr_storage with uninitialized family as blank
libceph: enable ceph in a non-default network namespace
|
|
Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net. Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.
This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
|
All users gone. Remove it before we get another one.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This kernel patch exports the value of the new
ignore_routes_with_linkdown via netconf.
v2: changes to notify userspace via netlink when sysctl values change
and proposed for 'net' since this could be considered a bugfix
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"These are fixes on top of the previous PM+ACPI pull requests
(including one fix for a 4.1 regression) and two commits adding
_CLS-based device enumeration support to the ACPI core and the ATA
subsystem that waited for the latest ACPICA changes to be merged.
Specifics:
- Fix for an ACPI resources management regression introduced during
the 4.1 cycle (that unfortunately went into -stable) effectively
reverting the bad commit along with the recent fixups on top of it
and using an alternative approach to address the underlying issue
(Rafael J Wysocki).
- Fix for a memory leak and an incorrect return value in an error
code path in the ACPI LPSS (Low-Power Subsystem) driver (Rafael J
Wysocki).
- Fix for a leftover dangling pointer in an error code path in the
new wakeup IRQ support code (Rafael J Wysocki).
- Fix to prevent infinite loops (due to errors in other places) from
happening in the core generic PM domains support code (Geert
Uytterhoeven).
- Hibernation documentation update/clarification (Uwe Geuder).
- Support for _CLS-based device enumeration in the ACPI core and in
the ATA subsystem (Suravee Suthikulpanit)"
* tag 'pm+acpi-4.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / wakeirq: Avoid setting power.wakeirq too hastily
ata: ahci_platform: Add ACPI _CLS matching
ACPI / scan: Add support for ACPI _CLS device matching
PM / hibernate: clarify resume documentation
PM / Domains: Avoid infinite loops in attach/detach code
ACPI / LPSS: Fix up acpi_lpss_create_device()
ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux into fixes
Merge "CSR SiRFSoC rtc iobrg move to regmap for 4.2" from Barry Song:
move CSR rtc iobrg read/write API to be regmap
this moves to general APIs, and all drivers will be changed based
on it.
* tag 'sirf-iobrg2regmap-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux:
ARM: prima2: move to use REGMAP APIs for rtciobrg
|
|
Like act_gact, act_mirred can be lockless in packet processing
1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) use rcu to protect tcfm_dev
4) Remove spinlock usage, as it is no longer needed.
Next step : add multi queue capability to ifb device
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Final step for gact RCU operation :
1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) Remove spinlock acquisition, as it is no longer needed.
Since this is the last contended lock in packet RX when tc gact is used,
this gives impressive gain.
My host with 8 RX queues was handling 5 Mpps before the patch,
and more than 11 Mpps after patch.
Tested:
On receiver :
dev=eth0
tc qdisc del dev $dev ingress 2>/dev/null
tc qdisc add dev $dev ingress
tc filter del dev $dev root pref 10 2>/dev/null
tc filter del dev $dev pref 10 2>/dev/null
tc filter add dev $dev est 1sec 4sec parent ffff: protocol ip prio 1 \
u32 match ip src 7.0.0.0/8 flowid 1:15 action drop
Sender sends packets flood from 7/8 network
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Second step for gact RCU operation :
We want to get rid of the spinlock protecting gact operations.
Stats (packets/bytes) will soon be per cpu.
gact_determ() would not work without a central packet counter,
so lets add it for this mode.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reuse existing percpu infrastructure John Fastabend added for qdisc.
This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qdisc_bstats_update_cpu() and other helpers were added to support
percpu stats for qdisc.
We want to add percpu stats for tc action, so this patch add common
helpers.
qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a cpu goes up some architectures (e.g. x86) have to walk the irq
space to set up the vector space for the cpu. While this needs extra
protection at the architecture level we can avoid a few race
conditions by preventing the concurrent allocation/free of irq
descriptors and the associated data.
When a cpu goes down it moves the interrupts which are targeted to
this cpu away by reassigning the affinities. While this happens
interrupts can be allocated and freed, which opens a can of race
conditions in the code which reassignes the affinities because
interrupt descriptors might be freed underneath.
Example:
CPU1 CPU2
cpu_up/down
irq_desc = irq_to_desc(irq);
remove_from_radix_tree(desc);
raw_spin_lock(&desc->lock);
free(desc);
We could protect the irq descriptors with RCU, but that would require
a full tree change of all accesses to interrupt descriptors. But
fortunately these kind of race conditions are rather limited to a few
things like cpu hotplug. The normal setup/teardown is very well
serialized. So the simpler and obvious solution is:
Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
|
|
* acpi-scan:
ata: ahci_platform: Add ACPI _CLS matching
ACPI / scan: Add support for ACPI _CLS device matching
|
|
Making tick_broadcast_oneshot_control() independent from
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for
CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined
there.
Provide a proper stub inline.
Fixes: f32dd1170511 'tick/broadcast: Make idle check independent from mode and config'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Currently the broadcast busy check, which prevents the idle code from
going into deep idle, works only in one shot mode.
If NOHZ and HIGHRES are off (config or command line) there is no
sanity check at all, so under certain conditions cpus are allowed to
go into deep idle, where the local timer stops, and are not woken up
again because there is no broadcast timer installed or a hrtimer based
broadcast device is not evaluated.
Move tick_broadcast_oneshot_control() into the common code and provide
proper subfunctions for the various config combinations.
The common check in tick_broadcast_oneshot_control() is for the C3STOP
misfeature flag of the local clock event device. If its not set, idle
can proceed. If set, further checks are necessary.
Provide checks for the trivial cases:
- If broadcast is disabled in the config, then return busy
- If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
busy if the broadcast device is hrtimer based.
- If oneshot mode is enabled in the config call the original
tick_broadcast_oneshot_control() function. That function needs
extra checks which will be implemented in seperate patches.
[ Split out from a larger combo patch ]
Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
|
|
Device drivers typically use ACPI _HIDs/_CIDs listed in struct device_driver
acpi_match_table to match devices. However, for generic drivers, we do not
want to list _HID for all supported devices. Also, certain classes of devices
do not have _CID (e.g. SATA, USB). Instead, we can leverage ACPI _CLS,
which specifies PCI-defined class code (i.e. base-class, subclass and
programming interface). This patch adds support for matching ACPI devices using
the _CLS method.
To support loadable module, current design uses _HID or _CID to match device's
modalias. With the new way of matching with _CLS this would requires modification
to the current ACPI modalias key to include _CLS. This patch appends PCI-defined
class-code to the existing ACPI modalias as following.
acpi:<HID>:<CID1>:<CID2>:..:<CIDn>:<bbsspp>:
E.g:
# cat /sys/devices/platform/AMDI0600:00/modalias
acpi:AMDI0600:010601:
where bb is th base-class code, ss is te sub-class code, and pp is the
programming interface code
Since there would not be _HID/_CID in the ACPI matching table of the driver,
this patch adds a field to acpi_device_id to specify the matching _CLS.
static const struct acpi_device_id ahci_acpi_match[] = {
{ ACPI_DEVICE_CLASS(PCI_CLASS_STORAGE_SATA_AHCI, 0xffffff) },
{},
};
In this case, the corresponded entry in modules.alias file would be:
alias acpi*:010601:* ahci_platform
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This effectively reverts the following three commits:
7bc10388ccdd ACPI / resources: free memory on error in add_region_before()
0f1b414d1907 ACPI / PNP: Avoid conflicting resource reservations
b9a5e5e18fbf ACPI / init: Fix the ordering of acpi_reserve_resources()
(commit b9a5e5e18fbf introduced regressions some of which, but not
all, were addressed by commit 0f1b414d1907 and commit 7bc10388ccdd
was a fixup on top of the latter) and causes ACPI fixed hardware
resources to be reserved at the fs_initcall_sync stage of system
initialization.
The story is as follows. First, a boot regression was reported due
to an apparent resource reservation ordering change after a commit
that shouldn't lead to such changes. Investigation led to the
conclusion that the problem happened because acpi_reserve_resources()
was executed at the device_initcall() stage of system initialization
which wasn't strictly ordered with respect to driver initialization
(and with respect to the initialization of the pcieport driver in
particular), so a random change causing the device initcalls to be
run in a different order might break things.
The response to that was to attempt to run acpi_reserve_resources()
as soon as we knew that ACPI would be in use (commit b9a5e5e18fbf).
However, that turned out to be too early, because it caused resource
reservations made by the PNP system driver to fail on at least one
system and that failure was addressed by commit 0f1b414d1907.
That fix still turned out to be insufficient, though, because
calling acpi_reserve_resources() before the fs_initcall stage of
system initialization caused a boot regression to happen on the
eCAFE EC-800-H20G/S netbook. That meant that we only could call
acpi_reserve_resources() at the fs_initcall initialization stage
or later, but then we might just as well call it after the PNP
initalization in which case commit 0f1b414d1907 wouldn't be
necessary any more.
For this reason, the changes made by commit 0f1b414d1907 are reverted
(along with a memory leak fixup on top of that commit), the changes
made by commit b9a5e5e18fbf that went too far are reverted too and
acpi_reserve_resources() is changed into fs_initcall_sync, which
will cause it to be executed after the PNP subsystem initialization
(which is an fs_initcall) and before device initcalls (including
the pcieport driver initialization) which should avoid the initial
issue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
Link: http://marc.info/?t=143092384600002&r=1&w=2
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
Link: http://marc.info/?t=143389402600001&r=1&w=2
Fixes: b9a5e5e18fbf "ACPI / init: Fix the ordering of acpi_reserve_resources()"
Reported-by: Roland Dreier <roland@purestorage.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"Bug fixes (all for stable kernels) for ext4:
- address corner cases for indirect blocks->extent migration
- fix reserved block accounting invalidate_page when
page_size != block_size (i.e., ppc or 1k block size file systems)
- fix deadlocks when a memcg is under heavy memory pressure
- fix fencepost error in lazytime optimization"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: replace open coded nofail allocation in ext4_free_blocks()
ext4: correctly migrate a file with a hole at the beginning
ext4: be more strict when migrating to non-extent based file
ext4: fix reservation release on invalidatepage for delalloc fs
ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp
bufferhead: Add _gfp version for sb_getblk()
ext4: fix fencepost error in lazytime optimization
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"It's been a busy development cycle for target-core in a number of
different areas.
The fabric API usage for se_node_acl allocation is now within
target-core code, dropping the external API callers for all fabric
drivers tree-wide.
There is a new conversion to RCU hlists for se_node_acl and
se_portal_group LUN mappings, that turns fast-past LUN lookup into a
completely lockless code-path. It also removes the original
hard-coded limitation of 256 LUNs per fabric endpoint.
The configfs attributes for backends can now be shared between core
and driver code, allowing existing drivers to use common code while
still allowing flexibility for new backend provided attributes.
The highlights include:
- Merge sbc_verify_dif_* into common code (sagi)
- Remove iscsi-target support for obsolete IFMarker/OFMarker
(Christophe Vu-Brugier)
- Add bidi support in target/user backend (ilias + vangelis + agover)
- Move se_node_acl allocation into target-core code (hch)
- Add crc_t10dif_update common helper (akinobu + mkp)
- Handle target-core odd SGL mapping for data transfer memory
(akinobu)
- Move transport ID handling into target-core (hch)
- Move task tag into struct se_cmd + support 64-bit tags (bart)
- Convert se_node_acl->device_list[] to RCU hlist (nab + hch +
paulmck)
- Convert se_portal_group->tpg_lun_list[] to RCU hlist (nab + hch +
paulmck)
- Simplify target backend driver registration (hch)
- Consolidate + simplify target backend attribute implementations
(hch + nab)
- Subsume se_port + t10_alua_tg_pt_gp_member into se_lun (hch)
- Drop lun_sep_lock for se_lun->lun_se_dev RCU usage (hch + nab)
- Drop unnecessary core_tpg_register TFO parameter (nab)
- Use 64-bit LUNs tree-wide (hannes)
- Drop left-over TARGET_MAX_LUNS_PER_TRANSPORT limit (hannes)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (76 commits)
target: Bump core version to v5.0
target: remove target_core_configfs.h
target: remove unused TARGET_CORE_CONFIG_ROOT define
target: consolidate version defines
target: implement WRITE_SAME with UNMAP bit using ->execute_unmap
target: simplify UNMAP handling
target: replace se_cmd->execute_rw with a protocol_data field
target/user: Fix inconsistent kmap_atomic/kunmap_atomic
target: Send UA when changing LUN inventory
target: Send UA upon LUN RESET tmr completion
target: Send UA on ALUA target port group change
target: Convert se_lun->lun_deve_lock to normal spinlock
target: use 'se_dev_entry' when allocating UAs
target: Remove 'ua_nacl' pointer from se_ua structure
target_core_alua: Correct UA handling when switching states
xen-scsiback: Fix compile warning for 64-bit LUN
target: Remove TARGET_MAX_LUNS_PER_TRANSPORT
target: use 64-bit LUNs
target: Drop duplicate + unused se_dev_check_wce
target: Drop unnecessary core_tpg_register TFO parameter
...
|
|
Pull NTB updates from Jon Mason:
"This includes a pretty significant reworking of the NTB core code, but
has already produced some significant performance improvements.
An abstraction layer was added to allow the hardware and clients to be
easily added. This required rewriting the NTB transport layer for
this abstraction layer. This modification will allow future "high
performance" NTB clients.
In addition to this change, a number of performance modifications were
added. These changes include NUMA enablement, using CPU memcpy
instead of asyncdma, and modification of NTB layer MTU size"
* tag 'ntb-4.2' of git://github.com/jonmason/ntb: (22 commits)
NTB: Add split BAR output for debugfs stats
NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe
NTB: Print driver name and version in module init
NTB: Increase transport MTU to 64k from 16k
NTB: Rename Intel code names to platform names
NTB: Default to CPU memcpy for performance
NTB: Improve performance with write combining
NTB: Use NUMA memory in Intel driver
NTB: Use NUMA memory and DMA chan in transport
NTB: Rate limit ntb_qp_link_work
NTB: Add tool test client
NTB: Add ping pong test client
NTB: Add parameters for Intel SNB B2B addresses
NTB: Reset transport QP link stats on down
NTB: Do not advance transport RX on link down
NTB: Differentiate transport link down messages
NTB: Check the device ID to set errata flags
NTB: Enable link for Intel root port mode in probe
NTB: Read peer info from local SPAD in transport
NTB: Split ntb_hw_intel and ntb_transport drivers
...
|
|
Pull kvm fixes from Paolo Bonzini:
"Except for the preempt notifiers fix, these are all small bugfixes
that could have been waited for -rc2. Sending them now since I was
taking care of Peter's patch anyway"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: add hyper-v crash msrs values
KVM: x86: remove data variable from kvm_get_msr_common
KVM: s390: virtio-ccw: don't overwrite config space values
KVM: x86: keep track of LVT0 changes under APICv
KVM: x86: properly restore LVT0
KVM: x86: make vapics_in_nmi_mode atomic
sched, preempt_notifier: separate notifier registration from static_key inc/dec
|
|
Change ntb_hw_intel to use the new NTB hardware abstraction layer.
Split ntb_transport into its own driver. Change it to use the new NTB
hardware abstraction layer.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Abstract the NTB device behind a programming interface, so that it can
support different hardware and client drivers.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq update from Thomas Gleixner:
"The last update for 4.2 is just moving a macro from a local header to
the global one, so it can be used in architecture code as well.
Cleanup of the now empty local header is 4.3 material"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Move IRQCHIP_DECLARE macro to include/linux/irqchip.h
|