Age | Commit message (Collapse) | Author |
|
svc_unregister() call have to be removed from svc_destroy() since it will be
called in sv_shutdown callback.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
New function ("svc_uses_rpcbind") will be used to detect, that new service will
send portmapper register calls. For such services we will create rpcbind
clients and remove all stale portmap registrations.
Also, svc_rpcb_cleanup() will be set as sv_shutdown callback for such services
in case of this field wasn't initialized earlier. This will allow to destroy
rpcbind clients when no other users of them left.
Note: Currently, any creating service will be detected as portmap user.
Probably, this is wrong. But now it depends on program versions "vs_hidden"
flag.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This helpers will be used only for those services, that will send portmapper
registration calls.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
All is simple: we just increase users counter if rpcbind clients has been
created already. Otherwise we create them and set users counter to 1.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
v6:
1) added write memory barrier to rpcb_set_local to make sure, that rpcbind
clients become valid before rpcb_users assignment
2) explicitly set rpcb_users to 1 instead of incrementing it (looks clearer from
my pow).
v5: fixed races with rpcb_users in rpcb_get_local()
This helpers will be used for dynamical creation and destruction of rpcbind
clients.
Variable rpcb_users is actually a counter of lauched RPC services. If rpcbind
clients has been created already, then we just increase rpcb_users.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
mm: memory hotplug: Check if pages are correctly reserved on a per-section basis
Revert "memory hotplug: Correct page reservation checking"
Update email address for stable patch submission
dynamic_debug: fix undefined reference to `__netdev_printk'
dynamic_debug: use a single printk() to emit messages
dynamic_debug: remove num_enabled accounting
dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions
uio: Support physical addresses >32 bits on 32-bit systems
sysfs: add unsigned long cast to prevent compile warning
drivers: base: print rejected matches with DEBUG_DRIVER
memory hotplug: Correct page reservation checking
memory hotplug: Refuse to add unaligned memory regions
remove the messy code file Documentation/zh_CN/SubmitChecklist
ARM: mxc: convert device creation to use platform_device_register_full
new helper to create platform devices with dma mask
docs/driver-model: Update device class docs
docs/driver-model: Document device.groups
kobj_uevent: Ignore if some listeners cannot handle message
dynamic_debug: make netif_dbg() call __netdev_printk()
dynamic_debug: make netdev_dbg() call __netdev_printk()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
MAINTAINERS: linux-m32r is moderated for non-subscribers
linux@lists.openrisc.net is moderated for non-subscribers
Drop default from "DM365 codec select" choice
parisc: Kconfig: cleanup Kernel page size default
Kconfig: remove redundant CONFIG_ prefix on two symbols
cris: remove arch/cris/arch-v32/lib/nand_init.S
microblaze: add missing CONFIG_ prefixes
h8300: drop puzzling Kconfig dependencies
MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
tty: drop superfluous dependency in Kconfig
ARM: mxc: fix Kconfig typo 'i.MX51'
Fix file references in Kconfig files
aic7xxx: fix Kconfig references to READMEs
Fix file references in drivers/ide/
thinkpad_acpi: Fix printk typo 'bluestooth'
bcmring: drop commented out line in Kconfig
btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
doc: raw1394: Trivial typo fix
CIFS: Don't free volume_info->UNC until we are entirely done with it.
treewide: Correct spelling of successfully in comments
...
|
|
The sunrpc layer keeps a cache of recently used credentials and
'unx_match' is used to find the credential which matches the current
process.
However unx_match allows a match when the cached credential has extra
groups at the end of uc_gids list which are not in the process group list.
So if a process with a list of (say) 4 group accesses a file and gains
access because of the last group in the list, then another process
with the same uid and gid, and a gid list being the first tree of the
gids of the original process tries to access the file, it will be
granted access even though it shouldn't as the wrong rpc credential
will be used.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
|
|
When hybrid mode is enabled (accept_ra == 2), the kernel also sees RAs
generated locally. This is useful since it allows the kernel to auto-configure
its own interface addresses.
However, if 'accept_ra_defrtr' and/or 'accept_ra_rtr_pref' are set and the
locally generated RAs announce the default route and/or other route information,
the kernel happily inserts bogus routes with its own address as gateway.
With this patch, adding routes from an RA will be skiped when the RAs source
address matches any local address, just as if 'accept_ra_defrtr' and
'accept_ra_rtr_pref' were set to 0.
Signed-off-by: Andreas Hofmeister <andi@collax.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dan Siemon would like to add tunnelling support to cls_flow
This preliminary patch introduces use of skb_header_pointer() to help
this task, while avoiding skb head reallocation because of deep packet
inspection.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In func ipv4_dst_check,check_peer_pmtu should be called only when peer is updated.
So,if the peer is not updated in ip_rt_frag_needed,we can not inc __rt_peer_genid.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
It was enabled by default and the messages guarded
by the define are useful.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This helps in more control over debugging.
root@qemu-img-64:~# ls /pass/123
ls: cannot access /pass/123: No such file or directory
root@qemu-img-64:~# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
ls-1536 [001] 70.928584: 9p_protocol_dump: clnt 18446612132784021504 P9_TWALK(tag = 1)
000: 16 00 00 00 6e 01 00 01 00 00 00 02 00 00 00 01
010: 00 03 00 31 32 33 00 00 00 ff ff ff ff 00 00 00
ls-1536 [001] 70.928587: <stack trace>
=> trace_9p_protocol_dump
=> p9pdu_finalize
=> p9_client_rpc
=> p9_client_walk
=> v9fs_vfs_lookup
=> d_alloc_and_lookup
=> walk_component
=> path_lookupat
ls-1536 [000] 70.929696: 9p_protocol_dump: clnt 18446612132784021504 P9_RLERROR(tag = 1)
000: 0b 00 00 00 07 01 00 02 00 00 00 4e 03 00 02 00
010: 00 00 00 00 03 00 02 00 00 00 00 00 ff 43 00 00
ls-1536 [000] 70.929697: <stack trace>
=> trace_9p_protocol_dump
=> p9_client_rpc
=> p9_client_walk
=> v9fs_vfs_lookup
=> d_alloc_and_lookup
=> walk_component
=> path_lookupat
=> do_path_lookup
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Without this msize=4294967295 will result in a crash
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
Instead of saying all integer argument option should be listed in the beginning
move integer parsing to each option type.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
We dereferenced "req->tc" and "req->rc" before checking for NULL.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
* remove lot of update to different data structure
* add a seperate callback for zero copy request.
* above makes non zero copy code path simpler
* remove conditionalizing TREAD/TREADDIR/TWRITE in the zero copy path
* Fix the dotu p9_check_errors with zero copy. Add sufficient doc around
* Add support for both in and output buffers in zero copy callback
* pin and unpin pages in the same context
* use helpers instead of defining page offset and rest of page ourself
* Fix mem leak in p9_check_errors
* Remove 'E' and 'F' in p9pdu_vwritef
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
There is a long standing bug in linux tcp stack, about ACK messages sent
on behalf of TIME_WAIT sockets.
In the IP header of the ACK message, we choose to reflect TOS field of
incoming message, and this might break some setups.
Example of things that were broken :
- Routing using TOS as a selector
- Firewalls
- Trafic classification / shaping
We now remember in timewait structure the inet tos field and use it in
ACK generation, and route lookup.
Notes :
- We still reflect incoming TOS in RST messages.
- We could extend MuraliRaja Muniraju patch to report TOS value in
netlink messages for TIME_WAIT sockets.
- A patch is needed for IPv6
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Renato Westphal noticed that since commit a2835763e130c343ace5320c20d33c281e7097b7
"rtnetlink: handle rtnl_link netlink notifications manually" was merged
we no longer send a netlink message when a networking device is moved
from one network namespace to another.
Fix this by adding the missing manual notification in dev_change_net_namespaces.
Since all network devices that are processed by dev_change_net_namspaces are
in the initialized state the complicated tests that guard the manual
rtmsg_ifinfo calls in rollback_registered and register_netdevice are
unnecessary and we can just perform a plain notification.
Cc: stable@kernel.org
Tested-by: Renato Westphal <renatowestphal@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reported-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit f39925dbde7788cfb96419c0f092b086aa325c0f
(ipv4: Cache learned redirect information in inetpeer.)
removed some ICMP packet validations which are required by
RFC 1122, section 3.2.2.2:
...
A Redirect message SHOULD be silently discarded if the new
gateway address it specifies is not on the same connected
(sub-) net through which the Redirect arrived [INTRO:2,
Appendix A], or if the source of the Redirect is not the
current first-hop gateway for the specified destination (see
Section 3.3.1).
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The pair of functions,
* skb_clone_tx_timestamp()
* skb_complete_tx_timestamp()
were designed to allow timestamping in PHY devices. The first
function, called during the MAC driver's hard_xmit method, identifies
PTP protocol packets, clones them, and gives them to the PHY device
driver. The PHY driver may hold onto the packet and deliver it at a
later time using the second function, which adds the packet to the
socket's error queue.
As pointed out by Johannes, nothing prevents the socket from
disappearing while the cloned packet is sitting in the PHY driver
awaiting a timestamp. This patch fixes the issue by taking a reference
on the socket for each such packet. In addition, the comments
regarding the usage of these function are expanded to highlight the
rule that PHY drivers must use skb_complete_tx_timestamp() to release
the packet, in order to release the socket reference, too.
These functions first appeared in v2.6.36.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@vger.kernel.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more
const attributes to callers.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcp_md5_hash_header() writes into skb header a temporary zero value,
this might confuse other users of this area.
Since tcphdr is small (20 bytes), copy it in a temporary variable and
make the change in the copy.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.
For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of using the dev->next chain and trying to resync at each call to
dev_seq_start, use the name hash, keeping the bucket and the offset in
seq->private field.
Tests revealed the following results for ifconfig > /dev/null
* 1000 interfaces:
* 0.114s without patch
* 0.089s with patch
* 3000 interfaces:
* 0.489s without patch
* 0.110s with patch
* 5000 interfaces:
* 1.363s without patch
* 0.250s with patch
* 128000 interfaces (other setup):
* ~100s without patch
* ~30s with patch
Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I've split this bit out of the skb frag destructor patch since it helps enforce
the use of the fragment API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Up till now the IP{,V6}_TRANSPARENT socket options (which actually set
the same bit in the socket struct) have required CAP_NET_ADMIN
privileges to set or clear the option.
- we make clearing the bit not require any privileges.
- we allow CAP_NET_ADMIN to set the bit (as before this change)
- we allow CAP_NET_RAW to set this bit, because raw
sockets already pretty much effectively allow you
to emulate socket transparency.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcp_fin() only needs socket pointer, we can remove skb and th params.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Daniel Turull reported inaccuracies in pktgen when using low packet
rates, because we call ndelay(val) with values bigger than 20000.
Instead of calling ndelay() for delays < 100us, we can instead loop
calling ktime_now() only.
Reported-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 356f039822b (TCP: increase default initial receive
window.), we allow sender to send 10 (TCP_DEFAULT_INIT_RCVWND) segments.
Change tcp_fixup_rcvbuf() to reflect this change, even if no real change
is expected, since sysctl_tcp_rmem[1] = 87380 and this value
is bigger than tcp_fixup_rcvbuf() computed rcvmem (~23720)
Note: Since commit 356f039822b limited default window to maximum of
10*1460 and 2*MSS, we use same heuristic in this patch.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It seems ip_gre is able to change dev->needed_headroom on the fly.
Its is not legal unfortunately and triggers a BUG in raw_sendmsg()
skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev)
< another cpu change dev->needed_headromm (making it bigger)
...
skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));
We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE()
-> we crash later because skb head is exhausted.
Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route
header_len in max_headroom calculation)
Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Timo Teräs <timo.teras@iki.fi>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.
In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:
6a930b9f163d7e6d9ef692e05616c4ede65038ec cxgb3: convert to SKB paged frag API.
5dc3e196ea21e833128d51eb5b788a070fea1f28 myri10ge: convert to SKB paged frag API.
0e0634d20dd670a89af19af2a686a6cce943ac14 vmxnet3: convert to SKB paged frag API.
86ee8130a46769f73f8f423f99dbf782a09f9233 virtionet: convert to SKB paged frag API.
4a22c4c919c201c2a7f4ee09e672435a3072d875 sfc: convert to SKB paged frag API.
18324d690d6a5028e3c174fc1921447aedead2b8 cassini: convert to SKB paged frag API.
b061b39e3ae18ad75466258cf2116e18fa5bbd80 benet: convert to SKB paged frag API.
b7b6a688d217936459ff5cf1087b2361db952509 bnx2: convert to SKB paged frag API.
804cf14ea5ceca46554d5801e2817bba8116b7e5 net: xfrm: convert to SKB frag APIs
ea2ab69379a941c6f8884e290fdd28c93936a778 net: convert core to skb paged frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
when use dst_get_neighbour to get neighbour, we need
rcu_read_lock to protect, since dst_get_neighbour uses
rcu_dereference.
The bug was reported by Ari Savolainen <ari.m.savolainen@gmail.com>
[ 105.612095]
[ 105.612096] ===================================================
[ 105.612100] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 105.612101] ---------------------------------------------------
[ 105.612103] include/net/dst.h:91 invoked rcu_dereference_check()
without protection!
[ 105.612105]
[ 105.612106] other info that might help us debug this:
[ 105.612106]
[ 105.612108]
[ 105.612108] rcu_scheduler_active = 1, debug_locks = 0
[ 105.612110] 1 lock held by dnsmasq/2618:
[ 105.612111] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815df8c7>]
rtnl_lock+0x17/0x20
[ 105.612120]
[ 105.612121] stack backtrace:
[ 105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41
[ 105.612125] Call Trace:
[ 105.612129] [<ffffffff810ccdcb>] lockdep_rcu_dereference+0xbb/0xc0
[ 105.612132] [<ffffffff815dc5a9>] neigh_update+0x4f9/0x5f0
[ 105.612135] [<ffffffff815da001>] ? neigh_lookup+0xe1/0x220
[ 105.612139] [<ffffffff81639298>] arp_req_set+0xb8/0x230
[ 105.612142] [<ffffffff8163a59f>] arp_ioctl+0x1bf/0x310
[ 105.612146] [<ffffffff810baa40>] ? lock_hrtimer_base.isra.26+0x30/0x60
[ 105.612150] [<ffffffff8163fb75>] inet_ioctl+0x85/0x90
[ 105.612154] [<ffffffff815b5520>] sock_do_ioctl+0x30/0x70
[ 105.612157] [<ffffffff815b55d3>] sock_ioctl+0x73/0x280
[ 105.612162] [<ffffffff811b7698>] do_vfs_ioctl+0x98/0x570
[ 105.612165] [<ffffffff811a5c40>] ? fget_light+0x340/0x3a0
[ 105.612168] [<ffffffff811b7bbf>] sys_ioctl+0x4f/0x80
[ 105.612172] [<ffffffff816fdcab>] system_call_fastpath+0x16/0x1b
Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is just a cleanup.
My testing version of Smatch warns about this:
net/core/filter.c +380 check_load_and_stores(6)
warn: check 'flen' for negative values
flen comes from the user. We try to clamp the values here between 1
and BPF_MAXINSNS but the clamp doesn't work because it could be
negative. This is a bug, but it's not exploitable.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This cleanup patch removes unnecessary include from net/ipv6/ip6_fib.c.
Signed-off-by: Kevin Wilson <wkevils@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipv4: compat_ioctl is local to af_inet.c, make it static
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
we should decrease ops->unresolved_rules when deleting a unresolved rule.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds a sanity check on the values provided by user space for
the hardware time stamping configuration. If the values lie outside of
the absolute limits, then the ioctl request will be denied.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch moves the rcu_barrier from rollback_registered_many
(inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock).
This allows us to gain the full benefit of sychronize_net calling
synchronize_rcu_expedited when the rtnl_lock is held.
The rcu_barrier in rollback_registered_many was originally a synchronize_net
but was promoted to be a rcu_barrier() when it was found that people were
unnecessarily hitting the 250ms wait in netdev_wait_allrefs(). Changing
the rcu_barrier back to a synchronize_net is therefore safe.
Since we only care about waiting for the rcu callbacks before we get
to netdev_wait_allrefs() it is also safe to move the wait into
netdev_run_todo.
This was tested by creating and destroying 1000 tap devices and observing
/proc/lock_stat. /proc/lock_stat reports this change reduces the hold
times of the rtnl_lock by a factor of 10. There was no observable
difference in the amount of time it takes to destroy a network device.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It can trivially be replaced with rpc_restart_call_prepare.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Initial cwnd being 10 (TCP_INIT_CWND) instead of 3, change
tcp_fixup_sndbuf() to get more than 16384 bytes (sysctl_tcp_wmem[1]) in
initial sk_sndbuf
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb_recycle_check resets the skb if it's eligible for recycling.
However, there are times when a driver might want to optionally
manipulate the skb data with the skb before resetting the skb,
but after it has determined eligibility. We do this by splitting the
eligibility check from the skb reset, creating two inline functions to
accomplish that task.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The transparent socket option setting was not copied to the time wait
socket when an inet socket was being replaced by a time wait socket. This
broke the --transparent option of the socket match and may have caused
that FIN packets belonging to sockets in FIN_WAIT2 or TIME_WAIT state
were being dropped by the packet filter.
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.
Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Calling icmpv6_send() on a local message size error leads to
an incorrect update of the path mtu. So use xfrm6_local_rxpmtu()
to notify about the pmtu if the IPV6_DONTFRAG socket option is
set on an udp or raw socket, according RFC 3542 and use
ipv6_local_error() otherwise.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ip6_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
On IPsec the effective mtu is lower because we need to add the protocol
headers and trailers later when we do the IPsec transformations. So after
the IPsec transformations the packet might be too big, which leads to a
slowpath fragmentation then. This patch fixes this by building the packets
based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
handling to this.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|