Age | Commit message (Collapse) | Author |
|
For a picked up connection, the window win is scaled twice: one is by the
initialization code, and the other is by the sender updating code.
I use the temporary variable swin instead of modifying the variable win.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As soon as an skb is queued into socket receive_queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit dc1f8bf68b311b1537cb65893430b6796118498a ('netdev: change
transmit to limited range type') changed the required return type and
9a1654ba0b50402a6bd03c7b0fe9b0200a5ea7b1 ('net: Optimize
hard_start_xmit() return checking') changed the valid numerical
return values.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 08baf561083bc27a953aa087dd8a664bb2b88e8e ('net:
txq_trans_update() helper') made it unnecessary for most drivers to
set net_device::trans_start (or netdev_queue::trans_start).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commits d314774cf2cd5dfeb39a00d37deee65d4c627927 ('netdev: network
device operations infrastructure') and
008298231abbeb91bc7be9e8b078607b816d1a4a ('netdev: add more functions
to netdevice ops') moved and renamed net device operation pointers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commits e308a5d806c852f56590ffdd3834d0df0cbed8d7 ('netdev: Add
netdev->addr_list_lock protection.') and
e8a0464cc950972824e2e128028ae3db666ec1ed ('netdev: Allocate multiple
queues for TX.') introduced more fine-grained locks.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit bea3348eef27e6044b6161fd04c3152215f96411 ('[NET]: Make NAPI
polling independent of struct net_device objects.') removed the
automatic disabling of NAPI polling by dev_close(), and drivers
must now do this themselves.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit e52ac3398c3d772d372b9b62ab408fd5eec96840 ('net: Use device
model to get driver name in skb_gso_segment()') removed the only
in-tree caller of ethtool ops that doesn't hold the RTNL lock.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Marvell has agreed to do maintenance on the sky2 driver.
* Add the developer to the maintainers file
* Remove the old reference to the long gone (sk98lin) driver
* Rearrange to fit current topic organization
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a slave comes up, we're unsetting the current_arp_slave without
removing active flags from it, which can lead to situations where we have
more than one slave with active flags in active-backup mode.
To avoid this situation we must remove the active flags from a slave before
removing it as a current_arp_slave.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A phonet packet is limited to USHRT_MAX bytes, this is never checked during
tx which means that the user can specify any size he wishes, and the kernel
will attempt to allocate that size.
In the good case, it'll lead to the following warning, but it may also cause
the kernel to kick in the OOM and kill a random task on the server.
[ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730()
[ 8921.749770] Pid: 5081, comm: trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46
[ 8921.756672] Call Trace:
[ 8921.758185] [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0
[ 8921.762868] [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20
[ 8921.765399] [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730
[ 8921.769226] [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20
[ 8921.771686] [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660
[ 8921.773919] [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240
[ 8921.776248] [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0
[ 8921.778294] [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0
[ 8921.780847] [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260
[ 8921.783179] [<ffffffff821b3c65>] __alloc_skb+0x75/0x170
[ 8921.784971] [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260
[ 8921.787111] [<ffffffff821b002e>] ? release_sock+0x7e/0x90
[ 8921.788973] [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20
[ 8921.791052] [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380
[ 8921.792931] [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180
[ 8921.794917] [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90
[ 8921.797053] [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70
[ 8921.798992] [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0
[ 8921.801395] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.803501] [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0
[ 8921.805505] [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140
[ 8921.807860] [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110
[ 8921.809986] [<ffffffff811958e7>] ? might_fault+0x97/0xa0
[ 8921.811998] [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90
[ 8921.814595] [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0
[ 8921.816702] [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200
[ 8921.818819] [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
[ 8921.820863] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.823318] [<ffffffff811e1926>] vfs_writev+0x46/0x60
[ 8921.825219] [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0
[ 8921.827127] [<ffffffff82658039>] system_call_fastpath+0x16/0x1b
[ 8921.829384] ---[ end trace dffe390f30db9eb7 ]---
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 2f533844242 (tcp: allow splice() to build full TSO packets) added
a regression for splice() calls using SPLICE_F_MORE.
We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall.
Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
with different semantic.
For all sendpage() providers, its a transparent change. Only
sock_sendpage() and tcp_sendpages() can differentiate the two different
flags provided by pipe_to_sendpage()
Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert array index from the loop bound to the loop index.
And remove the void type conversion to ip6_mc_del1_src() return
code, seem it is unnecessary, since ip6_mc_del1_src() does not
use __must_check similar attribute, no compiler will report the
warning when it is removed.
v2: enrich the commit header
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver uses a 2-order allocation, which is too much on architectures
like ppc64, which has a 64KiB page. This particular allocation is used
for large packet fragments that may have a size of 512, 1024, 4096 or
fill the whole allocation. So, a minimum size of 16384 is good enough
and will be the same size that is used in architectures of 4KiB sized
pages.
This will avoid allocation failures that we see when the system is under
stress, but still has plenty of memory, like the one below.
This will also allow us to set the interface MTU to higher values like
9000, which was not possible on ppc64 without this patch.
Node 1 DMA: 737*64kB 37*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 51904kB
83137 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 10420096kB
Total swap = 10420096kB
107776 pages RAM
1184 pages reserved
147343 pages shared
28152 pages non-shared
netstat: page allocation failure. order:2, mode:0x4020
Call Trace:
[c0000001a4fa3770] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
[c0000001a4fa3820] [c00000000016af38] .__alloc_pages_nodemask+0x618/0x930
[c0000001a4fa39a0] [c0000000001a71a0] .alloc_pages_current+0xb0/0x170
[c0000001a4fa3a40] [d00000000dcc3e00] .mlx4_en_alloc_frag+0x200/0x240 [mlx4_en]
[c0000001a4fa3b10] [d00000000dcc3f8c] .mlx4_en_complete_rx_desc+0x14c/0x250 [mlx4_en]
[c0000001a4fa3be0] [d00000000dcc4eec] .mlx4_en_process_rx_cq+0x62c/0x850 [mlx4_en]
[c0000001a4fa3d20] [d00000000dcc5150] .mlx4_en_poll_rx_cq+0x40/0x90 [mlx4_en]
[c0000001a4fa3dc0] [c0000000004e2bb8] .net_rx_action+0x178/0x450
[c0000001a4fa3eb0] [c00000000009c9b8] .__do_softirq+0x118/0x290
[c0000001a4fa3f90] [c000000000031df8] .call_do_softirq+0x14/0x24
[c000000184c3b520] [c00000000000e700] .do_softirq+0xf0/0x110
[c000000184c3b5c0] [c00000000009c6d4] .irq_exit+0xb4/0xc0
[c000000184c3b640] [c00000000000e964] .do_IRQ+0x144/0x230
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Tested-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit (bfab27a stmmac: add the experimental PCI support) the
IFF_UNICAST_FLT flag has been removed from the stmmac_mac_device_setup()
function. This patch re-adds the flag.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch clears a warning message of "MDC/MDIO access timeout" which may
appear when interface is loaded due to missing clock setting before resetting
the LED, and starting periodic function too early.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix a link problem on the second port of BCM57711 + BCM84823 boards due to
incorrect macro usage.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes a link problem on BCM57712 + BCM8727 designs in which the TX
laser is controller by GPIO, after 1.60.xx drivers were previously loaded.
On these designs the TX_LASER is enabled by logic AND between the PHY
(through MDIO), and the GPIO. When an old driver is used, it disables the
MDIO part, hence the GPIO control had no affect de facto.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix no-LED problem when link speed is 1G on BCM57712 + BCM8727 designs, by
removing a logic error checking for a different PHY.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix 578x0-SFI pre-emphasis settings per HW recommendations to achieve better
link strength.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
BCM57810-KR link may not come up in 1G after running loopback test, so set
the relevant registers to their default values before starting KR autoneg.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix 57810-KR flow-control handling link is achieved via CL37 AN.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix a problem in which PFC frames are not honored, due to incorrect link
attributes synchronization following PMF migration, and verify PFC XON is not
stuck from previous link change.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Modified from original patch from Chris.
The sky2 driver has to have 8 byte alignment of receive buffer
on some chip versions. On architectures which don't support efficient
unaligned access this doesn't work very well. The solution is to
just copy all received packets which is what the driver already
does for small packets.
This allows the driver to be used on the Tilera TILEmpower-Gx, since
the tile architecture doesn't currently handle kernel unaligned accesses,
just userspace.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current implemenation was buggy for slaves who use ndo_neigh_setup,
since the networking stack invokes the bonding device ndo entry (from
neigh_params_alloc) before any devices are enslaved, and the bonding
driver can't further delegate the call at that point in time. As a
result when bonding IPoIB devices, the neigh_cleanup hasn't been called.
Fix that by deferring the actual call into the slave ndo_neigh_setup
from the time the bonding neigh_setup is called.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 7d26bb103c4 "bonding: emit event when bonding changes MAC" didn't
take care to emit the NETDEV_CHANGEADDR event in bond_release, where bonding
actually changes the mac address (to all zeroes). As a result the neighbours
aren't deleted by the core networking code (which does so upon getting that
event).
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns
an error if the user provides less bytes than the size of struct
sctp_event_subscribe.
Struct sctp_event_subscribe needs to be extended by an u8 for every
new event or notification type that is added.
This obviously makes getsockopt fail for binaries that are compiled
against an older versions of <net/sctp/user.h> which do not contain
all event types.
This patch changes getsockopt behaviour to no longer return an error
if not enough bytes are being provided by the user. Instead, it
returns as much of sctp_event_subscribe as fits into the provided buffer.
This leads to the new behavior that users see what they have been aware
of at compile time.
The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have to decrement the conntrack counter if we fail to access the
zone extension.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The error path misses putting the timeout object. This patch adds
new function xt_ct_tg_timeout_put() to put the timeout object.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
NAPI is disabled during suspend and needs to be enabled on resume. Without
this the driver locks up during resume in rtl_reset_work() trying to disable
NAPI again.
Signed-off-by: Artem Savkov <artem.savkov@gmail.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 621b4d6 updated the bnx2x driver to a new FW version, but lacked
a commit to a header file with changes to the firmware's interface.
The missing interface change causes iscsi and fcoe to misbehave with the
updated firmware.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes Auto Power Saving configuration in ip101a_config_init
which was broken as there is no phy register write followed after
setting IP101A_APS_ON flag.
This patch also fixes the return value of ip101a_config_init.
Without this patch ip101a_config_init returns 2 which is not an error
accroding to IS_ERR and the mac driver will continue accessing 2 as
valid pointer to phy_dev resulting in memory fault.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net
|
|
In rare circumstances, a descriptor writeback flush may not work if it
arrives on a specific clock cycle as a writeback request is going out.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
When the adapter is closed while it is simultaneously going through a
reset, it can cause a null-pointer dereference when the two different code
paths simultaneously cleanup up the Tx/Rx resources.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Fix up code so that changes in DCB settings
are detected only when ixgbe_dcbnl_set_all is called.
Previously, a series of 'change' commands followed by
a call to ixgbe_dcbnl_set_all() would always be handled
as a HW change - even if the net change was zero.
This patch checks for this case of no actual change and
skips going through the HW set process.
Without this fix, the link could reset and result in
a link flap.
The core change in this patch is to check for changes
in the ixgbe_copy_dcb_cfg() routine - and return
a bitmask of detected changes. The other
places where changes were detected previously can be removed.
Signed-off-by: Eric Multanen <eric.w.multanen@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Now the helper function from filter.c for negative offsets is exported,
it can be used it in the jit to handle negative offsets.
First modify the asm load helper functions to handle:
- know positive offsets
- know negative offsets
- any offset
then the compiler can be modified to explicitly use these helper
when appropriate.
This fixes the case of a negative X register and allows to lift
the restriction that bpf programs with negative offsets can't
be jited.
Signed-of-by: Jan Seiffert <kaffeemonster@googlemail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function is renamed to make it a little more clear what it does.
It is not added to any .h because it is not for general consumption, only for
bpf internal use (and so by the jits).
Signed-of-by: Jan Seiffert <kaffeemonster@googlemail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The explanation of ip_local_port_range in
Documentation/networking/ip-sysctl.txt contains several factual
errors:
- The default value of ip_local_port_range does not depend on the
amount of memory available in the system.
- tcp_tw_recycle is not enabled by default.
- 1024-4999 is not the default value.
- Etc.
Clean up the mess.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)
The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.
4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)
This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)
In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.
Call tcp_push() only if MSG_MORE is not set in the flags parameter.
This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.
In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)
Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For every transmitted packet, ppp_start_xmit() will stop the netdev
queue and then, if appropriate, restart it. This causes the TX softirq
to run, entirely gratuitously.
This is "only" a waste of CPU time in the normal case, but it's actively
harmful when the PPP device is a TEQL slave — the wakeup will cause the
offending device to receive the next TX packet from the TEQL queue, when
it *should* have gone to the next slave in the list. We end up seeing
large bursts of packets on just *one* slave device, rather than using
the full available bandwidth over all slaves.
This patch fixes the problem by *not* unconditionally stopping the queue
in ppp_start_xmit(). It adds a return value from ppp_xmit_process()
which indicates whether the queue should be stopped or not.
It *doesn't* remove the call to netif_wake_queue() from
ppp_xmit_process(), because other code paths (especially from
ppp_output_wakeup()) need it there and it's messy to push it out to the
other callers to do it based on the return value. So we leave it in
place — it's a no-op in the case where the queue wasn't stopped, so it's
harmless in the TX path.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit f04565ddf52 (dev: use name hash for dev_seq_ops) added a second
regression, as some devices are missing from /proc/net/dev if many
devices are defined.
When seq_file buffer is filled, the last ->next/show() method is
canceled (pos value is reverted to value prior ->next() call)
Problem is after above commit, we dont restart the lookup at right
position in ->start() method.
Fix this by removing the internal 'pos' pointer added in commit, since
we need to use the 'loff_t *pos' provided by seq_file layer.
This also reverts commit 5cac98dd0 (net: Fix corruption
in /proc/*/net/dev_mcast), since its not needed anymore.
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mihai Maruseac <mmaruseac@ixiacom.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull drm update from Dave Airlie:
"This pull just contains a forward of the Intel fixes from Daniel.
The only annoyance is the RC6 enable, which really should have made
-next, but since Ubuntu are shipping it I reckon its getting a good
testing now by the time 3.4 comes out.
The pull from Daniel contains his pull message to me:
"A few patches for 3.4, major part is 3 regression fixes:
- ppgtt broke hibernate on snb/ivb. Somehow our QA claims that it
still works, which is why this has not been caught earlier.
- ppgtt flails in combination with dmar. I kinda expected this one :(
- fence handling bugfix for gen2/3. Iirc this one is about a year
old, fix curtesy Chris Wilson. I've created an shockingly simple
i-g-t test to catch this in the future."
Wrt regressions I've just got a report that gmbus (newly enabled
again in 3.4) is a bit noisy. I'm looking into this atm.
Also included are the rc6 enable patches for snb from Eugeni. I
wanted to include these in the main 3.4 pull but screwed it up.
Please hit me. Imo these kind of patches really should go in
before -rc1, but in thise case rc6 has brought us tons of press and
guinea pigs^W^W testers and ubuntu is already running with it. So
I estimate a pretty small chance for this to blow up.
And some smaller things:
- two minor locking snafus
- server gt2 ivb pciid
- 2 patches to sanitize the register state left behind by the bios
some more
- 2 new quirk entries
- cs readback trick against missed IRQs from ivb also enabled on snb
- sprite fix from Jesse"
Let's see if the "enable RC6 on sandybridge" finally works and sticks.
I've been enabling it by hand (i915.i915_enable_rc6=1) for several
months on my Macbook Air, and it definitely makes a difference (and has
worked for me). But every time we enabled it before it showed some odd
hw buglet for *somebody*.
This time it's all good, I'm sure.
* 'drm-fixes-intel' of git://people.freedesktop.org/~airlied/linux:
drm/i915: treat src w & h as fixed point in sprite handling code
drm/i915: no-lvds quirk on MSI DC500
drm/i915: Add lock on drm_helper_resume_force_mode
drm/i915: don't leak struct_mutex lock on ppgtt init failures
drm/i915: disable ppgtt on snb when dmar is enabled
drm/i915: add Ivy Bridge GT2 Server entries
drm/i915: properly clear SSC1 bit in the pch refclock init code
drm/i915: apply CS reg readback trick against missed IRQ on snb
drm/i915: quirk away broken OpRegion VBT
drm/i915: enable plain RC6 on Sandy Bridge by default
drm/i915: allow to select rc6 modes via kernel parameter
drm/i915: Mark untiled BLT commands as fenced on gen2/3
drm/i915: properly restore the ppgtt page directory on resume
drm/i915: Sanitize BIOS debugging bits from PIPECONF
|
|
Pull drm fixes from Dave Airlie:
"Mainly nouveau fixes, one for a regressions in -rc1, fixes for booting
on a ppc G5, and a Kconfig fix. Two radeon fixes, one oops, one s/r
fix. One udl mmap fix. And one core drm fix to stop bad fbdev apps
overwriting bits of ram."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm: Validate requested virtual size against allocated fb size
drm/radeon: Don't dereference possibly-NULL pointer.
mm, drm/udl: fixup vma flags on mmap
drm/radeon/kms: fix fans after resume
nouveau/bios: Fix tracking of BIOS image data
nouveau: Fix crash when pci_ram_rom() returns a size of 0
drm/nouveau: select POWER_SUPPLY
drm/nouveau: inform userspace of relaxed kernel subchannel requirements
Revert "drm/nouveau: inform userspace of new kernel subchannel requirements"
drm/nouveau: oops, create m2mf for nvd9 too
|
|
Pull arch/microblaze fixes from Michal Simek.
* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Fix ret_from_fork declaration
microblaze: Do not use tlb_skip in early_printk
microblaze: Add missing headers caused by disintegration asm/system.h
microblaze: Fix stack usage in PAGE_SIZE copy_tofrom_user
microblaze: Fix tlb_skip variable on noMMU system
microblaze: Fix __futex_atomic_op macro register usage
|