Age | Commit message (Collapse) | Author |
|
Just use .call_rcu instead. We can drop the rcu read lock
after obtaining a reference and re-acquire on return.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fixes the following sparse warning:
net/netfilter/nf_nat_core.c:1039:20: warning:
symbol 'nat_hook' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
synchronize_rcu() is expensive.
The commit phase currently enforces an unconditional
synchronize_rcu() after incrementing the generation counter.
This is to make sure that a packet always sees a consistent chain, either
nft_do_chain is still using old generation (it will skip the newly added
rules), or the new one (it will skip old ones that might still be linked
into the list).
We could just remove the synchronize_rcu(), it would not cause a crash but
it could cause us to evaluate a rule that was removed and new rule for the
same packet, instead of either-or.
To resolve this, add rule pointer array holding two generations, the
current one and the future generation.
In commit phase, allocate the rule blob and populate it with the rules that
will be active in the new generation.
Then, make this rule blob public, replacing the old generation pointer.
Then the generation counter can be incremented.
nft_do_chain() will either continue to use the current generation
(in case loop was invoked right before increment), or the new one.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
->commit() cannot fail at the moment.
Followup-patch adds kmalloc calls in the commit phase, so we'll need
to be able to handle errors.
Make it so that -EGAIN causes a full replay, and make other errors
cause the transaction to fail.
Failing is ok from a consistency point of view as long as we
perform all actions that could return an error before
we increment the generation counter and the base seq.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Similar to previous patch, this time, merge redirect+nat.
The redirect module is just 2k in size, get rid of it and make
redirect part available from the nat core.
before:
text data bss dec hex filename
19461 1484 4138 25083 61fb net/netfilter/nf_nat.ko
1236 792 0 2028 7ec net/netfilter/nf_nat_redirect.ko
after:
20340 1508 4138 25986 6582 net/netfilter/nf_nat.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, they are:
1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal.
2) Add support for map lookups to numgen, random and hash expressions,
from Laura Garcia.
3) Allow to register nat hooks for iptables and nftables at the same
time. Patchset from Florian Westpha.
4) Timeout support for rbtree sets.
5) ip6_rpfilter works needs interface for link-local addresses, from
Vincent Bernat.
6) Add nf_ct_hook and nf_nat_hook structures and use them.
7) Do not drop packets on packets raceing to insert conntrack entries
into hashes, this is particularly a problem in nfqueue setups.
8) Address fallout from xt_osf separation to nf_osf, patches
from Florian Westphal and Fernando Mancera.
9) Remove reference to struct nft_af_info, which doesn't exist anymore.
From Taehee Yoo.
This batch comes with is a conflict between 25fd386e0bc0 ("netfilter:
core: add missing __rcu annotation") in your tree and 2c205dd3981f
("netfilter: add struct nf_nat_hook and use it") coming in this batch.
This conflict can be solved by leaving the __rcu tag on
__netfilter_net_init() - added by 25fd386e0bc0 - and remove all code
related to nf_nat_decode_session_hook - which is gone after
2c205dd3981f, as described by:
diff --cc net/netfilter/core.c
index e0ae4aae96f5,206fb2c4c319..168af54db975
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo
EXPORT_SYMBOL_GPL(nf_ct_zone_dflt);
#endif /* CONFIG_NF_CONNTRACK */
- static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max)
-#ifdef CONFIG_NF_NAT_NEEDED
-void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
-EXPORT_SYMBOL(nf_nat_decode_session_hook);
-#endif
-
+ static void __net_init
+ __netfilter_net_init(struct nf_hook_entries __rcu **e, int max)
{
int h;
I can also merge your net-next tree into nf-next, solve the conflict and
resend the pull request if you prefer so.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In nfqueue, two consecutive skbuffs may race to create the conntrack
entry. Hence, the one that loses the race gets dropped due to clash in
the insertion into the hashes from the nf_conntrack_confirm() path.
This patch adds a new nf_conntrack_update() function which searches for
possible clashes and resolve them. NAT mangling for the packet losing
race is corrected by using the conntrack information that won race.
In order to avoid direct module dependencies with conntrack and NAT, the
nf_ct_hook and nf_nat_hook structures are used for this purpose.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Move decode_session() and parse_nat_setup_hook() indirections to struct
nf_nat_hook structure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Move the nf_ct_destroy indirection to the struct nf_ct_hook.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add garbage collection logic to expire elements stored in the rb-tree
representation.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This reverts commit f92b40a8b2645
("netfilter: core: only allow one nat hook per hook point"), this
limitation is no longer needed. The nat core now invokes these
functions and makes sure that hook evaluation stops after a mapping is
created and a null binding is created otherwise.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently the packet rewrite and instantiation of nat NULL bindings
happens from the protocol specific nat backend.
Invocation occurs either via ip(6)table_nat or the nf_tables nat chain type.
Invocation looks like this (simplified):
NF_HOOK()
|
`---iptable_nat
|
`---> nf_nat_l3proto_ipv4 -> nf_nat_packet
|
new packet? pass skb though iptables nat chain
|
`---> iptable_nat: ipt_do_table
In nft case, this looks the same (nft_chain_nat_ipv4 instead of
iptable_nat).
This is a problem for two reasons:
1. Can't use iptables nat and nf_tables nat at the same time,
as the first user adds a nat binding (nf_nat_l3proto_ipv4 adds a
NULL binding if do_table() did not find a matching nat rule so we
can detect post-nat tuple collisions).
2. If you use e.g. nft_masq, snat, redir, etc. uses must also register
an empty base chain so that the nat core gets called fro NF_HOOK()
to do the reverse translation, which is neither obvious nor user
friendly.
After this change, the base hook gets registered not from iptable_nat or
nftables nat hooks, but from the l3 nat core.
iptables/nft nat base hooks get registered with the nat core instead:
NF_HOOK()
|
`---> nf_nat_l3proto_ipv4 -> nf_nat_packet
|
new packet? pass skb through iptables/nftables nat chains
|
+-> iptables_nat: ipt_do_table
+-> nft nat chain x
`-> nft nat chain y
The nat core deals with null bindings and reverse translation.
When no mapping exists, it calls the registered nat lookup hooks until
one creates a new mapping.
If both iptables and nftables nat hooks exist, the first matching
one is used (i.e., higher priority wins).
Also, nft users do not need to create empty nat hooks anymore,
nat core always registers the base hooks that take care of reverse/reply
translation.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This adds the infrastructure to register nat hooks with the nat core
instead of the netfilter core.
nat hooks are used to configure nat bindings. Such hooks are registered
from ip(6)table_nat or by the nftables core when a nat chain is added.
After next patch, nat hooks will be registered with nf_nat instead of
netfilter core. This allows to use many nat lookup functions at the
same time while doing the real packet rewrite (nat transformation) in
one place.
This change doesn't convert the intended users yet to ease review.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This will allow the nat core to reuse the nf_hook infrastructure
to maintain nat lookup functions.
The raw versions don't assume a particular hook location, the
functions get added/deleted from the hook blob that is passed to the
functions.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Will be used in followup patch when nat types no longer
use nf_register_net_hook() but will instead register with the nat core.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Copy-pasted, both l3 helpers almost use same code here.
Split out the common part into an 'inet' helper.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.
TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.
The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.
Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch creates new attributes to accept a map as argument and
then perform the lookup with the generated hash accordingly.
Both current hash functions are supported: Jenkins and Symmetric Hash.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch uses the map lookup already included to be applied
for random number generation.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nfnetlink tracing is available since nft 0.6 (June 2016).
Remove old nf_log based tracing to avoid rule counter in main loop.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently the -EBUSY error return path is not free'ing resources
allocated earlier, leaving a memory leak. Fix this by exiting via the
error exit label err5 that performs the necessary resource clean
up.
Detected by CoverityScan, CID#1432975 ("Resource leak")
Fixes: 9744a6fcefcb ("netfilter: nf_tables: check if same extensions are set when adding elements")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When removing a rule that jumps to chain and such chain in the same
batch, this bogusly hits EBUSY. Add activate and deactivate operations
to expression that can be called from the preparation and the
commit/abort phases.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
currently matchinfo gets stored in the expression, but some xt matches
are very large.
To handle those we either need to switch nft core to kvmalloc and increase
size limit, or allocate the info blob of large matches separately.
This does the latter, this limits the scope of the changes to
nft_compat.
I picked a threshold of 192, this allows most matches to work as before and
handle only few ones via separate alloation (cgroup, u32, sctp, rt).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Next patch will make it possible for *info to be stored in
a separate allocation instead of the expr private area.
This removes the 'expr priv area is info blob' assumption
from the match init/destroy/eval functions.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nft_chain_stats_replace() and all other spots assume ->stats can be
NULL, but nft_update_chain_stats does not. It must do this check,
just because the jump label is set doesn't mean all basechains have stats
assigned.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
fixes these warnings:
'nfnl_cthelper_create' at net/netfilter/nfnetlink_cthelper.c:237:2,
'nfnl_cthelper_new' at net/netfilter/nfnetlink_cthelper.c:450:9:
./include/linux/string.h:246:9: warning: '__builtin_strncpy' specified bound 16 equals destination size [-Wstringop-truncation]
return __builtin_strncpy(p, q, size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Moreover, strncpy assumes null-terminated source buffers, but thats
not the case here.
Unlike strlcpy, nla_strlcpy *does* pad the destination buffer
while also considering nla attribute size.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
removes following sparse error:
net/netfilter/core.c:598:30: warning: incorrect type in argument 1 (different address spaces)
net/netfilter/core.c:598:30: expected struct nf_hook_entries **e
net/netfilter/core.c:598:30: got struct nf_hook_entries [noderef] <asn:4>**<noident>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Local clients are not properly synchronized on 32-bit CPUs when
updating stats (3.10+). Now it is possible estimation_timer (timer),
a stats reader, to interrupt the local client in the middle of
write_seqcount_{begin,end} sequence leading to loop (DEADLOCK).
The same interrupt can happen from received packet (SoftIRQ)
which updates the same per-CPU stats.
Fix it by disabling BH while updating stats.
Found with debug:
WARNING: inconsistent lock state
4.17.0-rc2-00105-g35cb6d7-dirty #2 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-R} -> {SOFTIRQ-ON-W} usage.
ftp/2545 [HC0[0]:SC0[0]:HE1:SE1] takes:
86845479 (&syncp->seq#6){+.+-}, at: ip_vs_schedule+0x1c5/0x59e [ip_vs]
{IN-SOFTIRQ-R} state was registered at:
lock_acquire+0x44/0x5b
estimation_timer+0x1b3/0x341 [ip_vs]
call_timer_fn+0x54/0xcd
run_timer_softirq+0x10c/0x12b
__do_softirq+0xc1/0x1a9
do_softirq_own_stack+0x1d/0x23
irq_exit+0x4a/0x64
smp_apic_timer_interrupt+0x63/0x71
apic_timer_interrupt+0x3a/0x40
default_idle+0xa/0xc
arch_cpu_idle+0x9/0xb
default_idle_call+0x21/0x23
do_idle+0xa0/0x167
cpu_startup_entry+0x19/0x1b
start_secondary+0x133/0x182
startup_32_smp+0x164/0x168
irq event stamp: 42213
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&syncp->seq#6);
<Interrupt>
lock(&syncp->seq#6);
*** DEADLOCK ***
Fixes: ac69269a45e8 ("ipvs: do not disable bh for long time")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Connections in One-packet scheduling mode (-o, --ops) are
removed with refcnt=0 because they are not hashed in conn table.
To avoid refcount_dec reporting this as error, change them to be
removed with refcount_dec_if_one as all other connections.
refcount_t hit zero at ip_vs_conn_put+0x31/0x40 [ip_vs]
in sh[15519], uid/euid: 497/497
WARNING: CPU: 0 PID: 15519 at ../kernel/panic.c:657
refcount_error_report+0x94/0x9e
Modules linked in: ip_vs_rr cirrus ttm sb_edac
edac_core drm_kms_helper crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel pcbc mousedev drm aesni_intel aes_x86_64
crypto_simd glue_helper cryptd psmouse evdev input_leds led_class
intel_agp fb_sys_fops syscopyarea sysfillrect intel_rapl_perf mac_hid
intel_gtt serio_raw sysimgblt agpgart i2c_piix4 i2c_core ata_generic
pata_acpi floppy cfg80211 rfkill button loop macvlan ip_vs
nf_conntrack libcrc32c crc32c_generic ip_tables x_tables ipv6
crc_ccitt autofs4 ext4 crc16 mbcache jbd2 fscrypto ata_piix libata
atkbd libps2 scsi_mod crc32c_intel i8042 rtc_cmos serio af_packet
dm_mod dax fuse xen_netfront xen_blkfront
CPU: 0 PID: 15519 Comm: sh Tainted: G W
4.15.17 #1-NixOS
Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
RIP: 0010:refcount_error_report+0x94/0x9e
RSP: 0000:ffffa344dde039c8 EFLAGS: 00010296
RAX: 0000000000000057 RBX: ffffffff92f20e06 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffffa344dde165c0
RBP: ffffa344dde03b08 R08: 0000000000000218 R09: 0000000000000004
R10: ffffffff93006a80 R11: 0000000000000001 R12: ffffa344d68cd100
R13: 00000000000001f1 R14: ffffffff92f12fb0 R15: 0000000000000004
FS: 00007fc9d2040fc0(0000) GS:ffffa344dde00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000262a000 CR3: 0000000016a0c004 CR4: 00000000001606f0
Call Trace:
<IRQ>
ex_handler_refcount+0x4e/0x80
fixup_exception+0x33/0x40
do_trap+0x83/0x140
do_error_trap+0x83/0xf0
? ip_vs_conn_drop_conntrack+0x120/0x1a5 [ip_vs]
? ip_finish_output2+0x29c/0x390
? ip_finish_output2+0x1a2/0x390
invalid_op+0x1b/0x40
RIP: 0010:ip_vs_conn_put+0x31/0x40 [ip_vs]
RSP: 0000:ffffa344dde03bb8 EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffa344df31cf00 RCX: ffffa344d7450198
RDX: 0000000000000003 RSI: 00000000fffffe01 RDI: ffffa344d7450140
RBP: 0000000000000002 R08: 0000000000000476 R09: 0000000000000000
R10: ffffa344dde03b28 R11: ffffa344df200000 R12: ffffa344d7d09000
R13: ffffa344def3a980 R14: ffffffffc04f6e20 R15: 0000000000000008
ip_vs_in.part.29.constprop.36+0x34f/0x640 [ip_vs]
? ip_vs_conn_out_get+0xe0/0xe0 [ip_vs]
ip_vs_remote_request4+0x47/0xa0 [ip_vs]
? ip_vs_in.part.29.constprop.36+0x640/0x640 [ip_vs]
nf_hook_slow+0x43/0xc0
ip_local_deliver+0xac/0xc0
? ip_rcv_finish+0x400/0x400
ip_rcv+0x26c/0x380
__netif_receive_skb_core+0x3a0/0xb10
? inet_gro_receive+0x23c/0x2b0
? netif_receive_skb_internal+0x24/0xb0
netif_receive_skb_internal+0x24/0xb0
napi_gro_receive+0xb8/0xe0
xennet_poll+0x676/0xb40 [xen_netfront]
net_rx_action+0x139/0x3a0
__do_softirq+0xde/0x2b4
irq_exit+0xae/0xb0
xen_evtchn_do_upcall+0x2c/0x40
xen_hvm_callback_vector+0x7d/0x90
</IRQ>
RIP: 0033:0x7fc9d11c91f9
RSP: 002b:00007ffebe8a2ea0 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffff0c
RAX: 00000000ffffffff RBX: 0000000002609808 RCX: 0000000000000054
RDX: 0000000000000001 RSI: 0000000002605440 RDI: 00000000025f940e
RBP: 00000000025f940e R08: 000000000260213d R09: 1999999999999999
R10: 000000000262a808 R11: 00000000025f942d R12: 00000000025f940e
R13: 00007fc9d1301e20 R14: 00000000025f9408 R15: 00007fc9d1302720
Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 e0 05 00
00 45 8b 84 24 38 04 00 00 41 89 c1 48 89 de 48 c7 c7 a8 2f f2 92 e8
7c fa ff ff <0f> 0b 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5
41 56
Reported-by: Net Filter <netfilternetfilter@gmail.com>
Fixes: b54ab92b84b6 ("netfilter: refcounter conversions")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Taehee Yoo reported following bug:
iptables-compat -I OUTPUT -m cpu --cpu 0
iptables-compat -F
lsmod |grep xt_cpu
xt_cpu 16384 1
Quote:
"When above command is given, a netlink message has two expressions that
are the cpu compat and the nft_counter.
The nft_expr_type_get() in the nf_tables_expr_parse() successes
first expression then, calls select_ops callback.
(allocates memory and holds module)
But, second nft_expr_type_get() in the nf_tables_expr_parse()
returns -EAGAIN because of request_module().
In that point, by the 'goto err1',
the 'module_put(info[i].ops->type->owner)' is called.
There is no release routine."
The core problem is that unlike all other expression,
nft_compat select_ops has side effects.
1. it allocates dynamic memory which holds an nft ops struct.
In all other expressions, ops has static storage duration.
2. It grabs references to the xt module that it is supposed to
invoke.
Depending on where things go wrong, error unwinding doesn't
always do the right thing.
In the above scenario, a new nft_compat_expr is created and
xt_cpu module gets loaded with a refcount of 1.
Due to to -EAGAIN, the netlink messages get re-parsed.
When that happens, nft_compat finds that xt_cpu is already present
and increments module refcount again.
This fixes the problem by making select_ops to have no visible
side effects and removes all extra module_get/put.
When select_ops creates a new nft_compat expression, the new
expression has a refcount of 0, and the xt module gets its refcount
incremented.
When error happens, the next call finds existing entry, but will no
longer increase the reference count -- the presence of existing
nft_xt means we already hold a module reference.
Because nft_xt_put is only called from nft_compat destroy hook,
it will never see the initial zero reference count.
->destroy can only be called after ->init(), and that will increase the
refcount.
Lastly, we now free nft_xt struct with kfree_rcu.
Else, we get use-after free in nf_tables_rule_destroy:
while (expr != nft_expr_last(rule) && expr->ops) {
nf_tables_expr_destroy(ctx, expr);
expr = nft_expr_next(expr); // here
nft_expr_next() dereferences expr->ops. This is safe
for all users, as ops have static storage duration.
In nft_compat case however, its ->destroy callback can
free the memory that hold the ops structure.
Tested-by: Taehee Yoo <ap420073@gmail.com>
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree, more relevant updates in this batch are:
1) Add Maglev support to IPVS. Moreover, store lastest server weight in
IPVS since this is needed by maglev, patches from from Inju Song.
2) Preparation works to add iptables flowtable support, patches
from Felix Fietkau.
3) Hand over flows back to conntrack slow path in case of TCP RST/FIN
packet is seen via new teardown state, also from Felix.
4) Add support for extended netlink error reporting for nf_tables.
5) Support for larger timeouts that 23 days in nf_tables, patch from
Florian Westphal.
6) Always set an upper limit to dynamic sets, also from Florian.
7) Allow number generator to make map lookups, from Laura Garcia.
8) Use hash_32() instead of opencode hashing in IPVS, from Vicent Bernat.
9) Extend ip6tables SRH match to support previous, next and last SID,
from Ahmed Abdelsalam.
10) Move Passive OS fingerprint nf_osf.c, from Fernando Fernandez.
11) Expose nf_conntrack_max through ctnetlink, from Florent Fourcot.
12) Several housekeeping patches for xt_NFLOG, x_tables and ebtables,
from Taehee Yoo.
13) Unify meta bridge with core nft_meta, then make nft_meta built-in.
Make rt and exthdr built-in too, again from Florian.
14) Missing initialization of tbl->entries in IPVS, from Cong Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This must now use a 64bit jiffies value, else we set
a bogus timeout on 32bit.
Fixes: 8e1102d5a1596 ("netfilter: nf_tables: support timeouts larger than 23 days")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
IPCTNL_MSG_CT_GET_STATS netlink command allow to monitor current number
of conntrack entries. However, if one wants to compare it with the
maximum (and detect exhaustion), the only solution is currently to read
sysctl value.
This patch add nf_conntrack_max value in netlink message, and simplify
monitoring for application built on netlink API.
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add nf_osf_ttl() and nf_osf_match() into nf_osf.c to prepare for
nf_tables support.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The modulus in the hash function was limited to > 1 as initially
there was no sense to create a hashing of just one element.
Nevertheless, there are certain cases specially for load balancing
where this case needs to be addressed.
This patch fixes the following error.
Error: Could not process rule: Numerical result out of range
add rule ip nftlb lb01 dnat to jhash ip saddr mod 1 map { 0: 192.168.0.10 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The solution comes to force the hash to 0 when the modulus is 1.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
|
|
This patch includes a new attribute in the numgen structure to allow
the lookup of an element based on the number generator as a key.
For this purpose, different ops have been included to extend the
current numgen inc functions.
Currently, only supported for numgen incremental operations, but
it will be supported for random in a follow-up patch.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
After processing the transaction log, the remaining entries of the log
need to be released.
However, in some cases no entries remain, e.g. because the transaction
did not remove anything.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ebtables uses find_match() rather than find_request_match in one case
(see bcf4934288402be3464110109a4dae3bd6fb3e93,
"netfilter: ebtables: Fix extension lookup with identical name"), so
extend the check on name length to those functions too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Dominique Martinet reported a TCP hang problem when simultaneous open was used.
The problem is that the tcp_conntracks state table is not smart enough
to handle the case. The state table could be fixed by introducing a new state,
but that would require more lines of code compared to this patch, due to the
required backward compatibility with ctnetlink.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Tested-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Similarly, tbl->entries is not initialized after kmalloc(),
therefore causes an uninit-value warning in ip_vs_lblc_check_expire(),
as reported by syzbot.
Reported-by: <syzbot+3e9695f147fb529aa9bc@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
tbl->entries is not initialized after kmalloc(), therefore
causes an uninit-value warning in ip_vs_lblc_check_expire()
as reported by syzbot.
Reported-by: <syzbot+3dfdea57819073a04f21@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next
Simon Horman says:
====================
IPVS Updates for v4.18
please consider these IPVS enhancements for v4.18.
* Whitepace cleanup
* Add Maglev hashing algorithm as a IPVS scheduler
Inju Song says "Implements the Google's Maglev hashing algorithm as a
IPVS scheduler. Basically it provides consistent hashing but offers some
special features about disruption and load balancing.
1) minimal disruption: when the set of destinations changes,
a connection will likely be sent to the same destination
as it was before.
2) load balancing: each destination will receive an almost
equal number of connections.
Seel also: [3.4 Consistent Hasing] in
https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
"
* Fix to correct implementation of Knuth's multiplicative hashing
which is used in sh/dh/lblc/lblcr algorithms. Instead the
implementation provided by the hash_32() macro is used.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
before:
text data bss dec hex filename
5056 844 0 5900 170c net/netfilter/nft_exthdr.ko
102456 2316 401 105173 19ad5 net/netfilter/nf_tables.ko
after:
106410 2392 401 109203 1aa93 net/netfilter/nf_tables.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
before:
text data bss dec hex filename
2657 844 0 3501 dad net/netfilter/nft_rt.ko
100826 2240 401 103467 1942b net/netfilter/nf_tables.ko
after:
2657 844 0 3501 dad net/netfilter/nft_rt.ko
102456 2316 401 105173 19ad5 net/netfilter/nf_tables.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
size net/netfilter/nft_meta.ko
text data bss dec hex filename
5826 936 1 6763 1a6b net/netfilter/nft_meta.ko
96407 2064 400 98871 18237 net/netfilter/nf_tables.ko
after:
100826 2240 401 103467 1942b net/netfilter/nf_tables.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
It overcomplicates things for no reason.
nft_meta_bridge only offers retrieval of bridge port interface name.
Because of this being its own module, we had to export all nft_meta
functions, which we can then make static again (which even reduces
the size of nft_meta -- including bridge port retrieval...):
before:
text data bss dec hex filename
1838 832 0 2670 a6e net/bridge/netfilter/nft_meta_bridge.ko
6147 936 1 7084 1bac net/netfilter/nft_meta.ko
after:
5826 936 1 6763 1a6b net/netfilter/nft_meta.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nft rejects rules that lack a timeout and a size limit when they're used
to add elements from packet path.
Pick a sane upperlimit instead of rejecting outright.
The upperlimit is visible to userspace, just as if it would have been
given during set declaration.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Marco De Benedetto says:
I would like to use a timeout of 30 days for elements in a set but it
seems there is a some kind of problem above 24d20h31m23s.
Fix this by using 'jiffies64' for timeout handling to get same behaviour
on 32 and 64bit systems.
nftables passes timeouts as u64 in milliseconds to the kernel,
but on kernel side we used a mixture of 'long' and jiffies conversions
rather than u64 and jiffies64.
Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1237
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|