summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2008-10-31mac80211: fix two kernel-doc warningsJohannes Berg
One parameter wasn't described and one I forgot to update when renaming it; also update TBDs in sta_info. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31wireless: remove struct regdom hintingJohannes Berg
The code needs to be split out and cleaned up, so as a first step remove the capability, to add it back in a subsequent patch as a separate function. Also remove the publically facing return value of the function and the wiphy argument. A number of internal functions go from being generic helpers to just being used for alpha2 setting. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31wireless: make regdom passing semantics simplerJohannes Berg
The regdom struct is given to the core, so it might as well free it in error conditions. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: Re-enable aggregationSujith
Wireless HW without any dedicated queues for aggregation do not need the ampdu_queues mechanism present right now in mac80211. Since mac80211 is still incomplete wrt TX MQ changes, do not allow aggregation sessions for drivers that set ampdu_queues. This is only an interim hack until Intel fixes the requeue issue. Signed-off-by: Sujith <Sujith.Manoharan@atheros.com> Signed-off-by: Luis Rodriguez <Luis.Rodriguez@Atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31wireless: avoid some net/ieee80211.h vs. linux/ieee80211.h conflictsJohn W. Linville
There is quite a lot of overlap in definitions between these headers... Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31wireless: use individual buffers for printing ssid valuesJohn W. Linville
Also change escape_ssid to print_ssid to match print_mac semantics. Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31wireless: remove NETWORK_EMPTY_ESSID flagJohn W. Linville
It is unnecessary and of questionable value. Also remove is_empty_ssid, as it is also unnecessary. Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31wireless: consolidate on a single escape_essid implementationJohn W. Linville
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31wireless: don't publish __regulatory_hintJohannes Berg
This function requires an internal lock to be held, so it cannot be published to other modules in the kernel. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: fix a few typos in mac80211 kernel docBob Copeland
Correct a handful of errors found while reading the mac80211 book. Signed-off-by: Bob Copeland <me@bobcopeland.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31Add nl80211 commands to get and set o11s mesh networking parameterscolin@cozybit.com
The two new commands are NL80211_CMD_GET_MESH_PARAMS and NL80211_CMD_SET_MESH_PARAMS. There is a new attribute enum, NL80211_ATTR_MESH_PARAMS, which enumerates the various mesh configuration parameters. Moved struct mesh_config from mac80211/ieee80211_i.h to net/cfg80211.h. nl80211_get_mesh_params and nl80211_set_mesh_params unpack the netlink messages and ask the driver to get or set the configuration. This is done via two new function stubs, get_mesh_params and set_mesh_params, in struct cfg80211_ops. Signed-off-by: Colin McCabe <colin@cozybit.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: remove rate_control_clearJohannes Berg
"Clearing" the rate control algorithm is pointless, none of the algorithms actually uses this operation and it's not even invoked properly for all channel switching. Also, there's no need to since rate control algorithms work per station. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211/drivers: rewrite the rate control APIJohannes Berg
So after the previous changes we were still unhappy with how convoluted the API is and decided to make things simpler for everybody. This completely changes the rate control API, now taking into account 802.11n with MCS rates and more control, most drivers don't support that though. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: rewrite HT handlingJohannes Berg
The HT handling has the following deficiencies, which I've (partially) fixed: * it always uses the AP info even if there is no AP, hence has no chance of working as an AP * it pretends to be HW config, but really is per-BSS * channel sanity checking is left to the drivers * it generally lets the driver control too much HT enabling is still wrong with this patch if you have more than one virtual STA mode interface, but that never happens currently. Once WDS, IBSS or AP/VLAN gets HT capabilities, it will also be wrong, see the comment in ieee80211_enable_ht(). Additionally, this fixes a number of bugs: * mac80211: ieee80211_set_disassoc doesn't notify the driver any more since the refactoring * iwl-agn-rs: always uses the HT capabilities from the wrong stuff mac80211 gives it rather than the actual peer STA * ath9k: a number of bugs resulting from the broken HT API I'm not entirely happy with putting the HT capabilities into struct ieee80211_sta as restricted to our own HT TX capabilities, but I see no cleaner solution for now. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: move bss_conf into vifJohannes Berg
Move bss_conf into the vif struct so that drivers can access it during ->tx without having to store it in the private data or similar. No driver updates because this is only for when they want to start using it. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: make retry limits part of hw configJohannes Berg
Instead of having a separate callback, use the HW config callback with a new flag to change retry limits. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31nl80211: export HT capabilitiesJohannes Berg
This exports the local HT capabilities in nl80211. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: introduce hw config change flagsJohannes Berg
This makes mac80211 notify the driver which configuration actually changed, e.g. channel etc. No driver changes, this is just plumbing, driver authors are expected to act on this if they want to. Also remove the HW CONFIG debug printk, it's incorrect, often we configure something else. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: kill hw.conf.antenna_sel_{rx,tx}Johannes Berg
Never actually used. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31802.11: clean up/fix HT supportJohannes Berg
This patch cleans up a number of things: * the unusable definition of the HT capabilities/HT information information elements * variable names that are hard to understand * mac80211: move ieee80211_handle_ht to ht.c and remove the unused enable_ht parameter * mac80211: fix bug with MCS rate 32 in ieee80211_handle_ht * mac80211: fix bug with casting the result of ieee80211_bss_get_ie to an information element _contents_ rather than the whole element, add size checking (another out-of-bounds access bug fixed!) * mac80211: remove some unused return values in favour of BUG_ON checking * a few minor other things Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: fix short slot handlingJohannes Berg
This patch makes mac80211 handle short slot requests from the AP properly. Also warn about uses of IEEE80211_CONF_SHORT_SLOT_TIME and optimise out the code since it cannot ever be hit anyway. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: remove max_antenna_gain configJohannes Berg
The antenna gain isn't exactly configurable, despite the belief of some unnamed individual who thinks that the EEPROM might influence it. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31mac80211: remove wiphy_to_hwJohannes Berg
This isn't used by anyone, if we ever need it we can add it back, until then it's useless. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-10-31misc: replace NIPQUAD()Harvey Harrison
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31pkt_sched: Add peek emulation for non-work-conserving qdiscs.Jarek Poplawski
This patch adds qdisc_peek_dequeued() wrapper to emulate peek method with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until dequeuing. This is mainly for compatibility reasons not to break some strange configs because peeking is expected for non-work-conserving parent qdiscs to query work-conserving child qdiscs. This implementation requires using qdisc_dequeue_peeked() wrapper instead of directly calling qdisc->dequeue() for all qdiscs ever querried with qdisc->ops->peek() or qdisc_peek_dequeued(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31pkt_sched: Add ->peek() methods for fifo, prio and SFQ qdiscs.Patrick McHardy
From: Patrick McHardy <kaber@trash.net> Just as a demonstration how easy adding a peek operation to the work-conserving qdiscs actually is. It doesn't need to keep or change any internal state in many cases thanks to the guarantee that the packet will either be dequeued or, if another packet arrives, the upper qdisc will immediately ->peek again to reevaluate the state. (This is only slightly modified Patrick's patch.) Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31pkt_sched: sch_generic: Add Qdisc_ops peek() method.Jarek Poplawski
Add Qdisc_ops peek() method in order to replace requeuing. Based on ideas and patches of Herbert Xu, Patrick McHardy and David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31xfrm: remove unused struct xfrm_policy::nextAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/p54/p54common.c
2008-10-30netns: add register_pernet_gen_subsys/unregister_pernet_gen_subsysAlexey Dobriyan
netns ops which are registered with register_pernet_gen_device() are shutdown strictly before those which are registered with register_pernet_subsys(). Sometimes this leads to opposite (read: buggy) shutdown ordering between two modules. Add register_pernet_gen_subsys()/unregister_pernet_gen_subsys() for modules which aren't elite enough for entry in struct net, and which can't use register_pernet_gen_device(). PPTP conntracking module is such one. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-30net: delete excess kernel-doc notationRandy Dunlap
Remove excess kernel-doc function parameters from networking header & driver files: Warning(include/net/sock.h:946): Excess function parameter or struct member 'sk' description in 'sk_filter_release' Warning(include/linux/netdevice.h:1545): Excess function parameter or struct member 'cpu' description in 'netif_tx_lock' Warning(drivers/net/wan/z85230.c:712): Excess function parameter or struct member 'regs' description in 'z8530_interrupt' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-30Merge branch 'davem-fixes' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
2008-10-31gianfar: Fix race in TBI/SerDes configurationTrent Piepho
The init_phy() function attaches to the PHY, then configures the SerDes<->TBI link (in SGMII mode). The TBI is on the MDIO bus with the PHY (sort of) and is accessed via the gianfar's MDIO registers, using the functions gfar_local_mdio_read/write(), which don't do any locking. The previously attached PHY will start a work-queue on a timer, and probably an irq handler as well, which will talk to the PHY and thus use the MDIO bus. This uses phy_read/write(), which have locking, but not against the gfar_local_mdio versions. The result is that PHY code will try to use the MDIO bus at the same time as the SerDes setup code, corrupting the transfers. Setting up the SerDes before attaching to the PHY will insure that there is no race between the SerDes code and *our* PHY, but doesn't fix everything. Typically the PHYs for all gianfar devices are on the same MDIO bus, which is associated with the first gianfar device. This means that the first gianfar's SerDes code could corrupt the MDIO transfers for a different gianfar's PHY. The lock used by phy_read/write() is contained in the mii_bus structure, which is pointed to by the PHY. This is difficult to access from the gianfar drivers, as there is no link between a gianfar device and the mii_bus which shares the same MDIO registers. As far as the device layer and drivers are concerned they are two unrelated devices (which happen to share registers). Generally all gianfar devices' PHYs will be on the bus associated with the first gianfar. But this might not be the case, so simply locking the gianfar's PHY's mii bus might not lock the mii bus that the SerDes setup code is going to use. We solve this by having the code that creates the gianfar platform device look in the device tree for an mdio device that shares the gianfar's registers. If one is found the ID of its platform device is saved in the gianfar's platform data. A new function in the gianfar mii code, gfar_get_miibus(), can use the bus ID to search through the platform devices for a gianfar_mdio device with the right ID. The platform device's driver data is the mii_bus structure, which the SerDes setup code can use to lock the current bus. Signed-off-by: Trent Piepho <tpiepho@freescale.com> CC: Andy Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-10-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes: Fix incompatibility with versions of Perl less than 5.6.0 kbuild: do not include arch/<ARCH>/include/asm in find-sources twice. kbuild: tag with git revision when git describe is missing kbuild: prevent modpost from looking for a .cmd file for a static library linked into a module kbuild: fix KBUILD_EXTRA_SYMBOLS adjust init section definitions scripts/checksyscalls.sh: fix for non-gnu sed scripts/package: don't break if %{_smp_mflags} isn't set kbuild: setlocalversion: dont include svn change count kbuild: improve check-symlink kbuild: mkspec - fix build rpm
2008-10-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: add quirk entry for no-name keyboard (0x13ba/0x0017) HID: fix hid_device_id for cross compiling HID: sync on deleted io_retry timer in usbhid driver HID: fix oops during suspend of unbound HID devices
2008-10-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: amd8111e: Fix rx return code pktgen: fix multiple queue warning mac80211.h: fix kernel-doc excesses p54: fix build warnings ath5k: Reset key cache on interface up, thus fixing resume mac80211: correct warnings in minstrel rate control algorithm RFKILL: fix input layer initialisation p54: fix misbehavings when firmware can't be found dm9601: runtime mac address change support via-velocity: use driver string instead of dev->name before register_netdev() drivers/net/wan/syncppp: Fix unused-var warnings mlx4: Setting the correct offset for default mac address mlx4_en: remove duplicated #include ibm_newemac: Fix typo in flow control config option ehea: Detect 16GB hugepages for firmware restriction dmfe: check pci_alloc_consistent errors qeth: avoid skb_under_panic for malformatted inbound data qeth: remove unnecessary support ckeck in sysfs route6 qeth: fix offset error in non prealloc header path qeth: remove non-recover-thread checkings
2008-10-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: ASoC: Fix WM9713 ALC Decay Time name ALSA: ASoC: Fix some minor errors in mpc5200 psc i2s driver ALSA: ASoC: Fix mono controls after conversion to support full int masks ALSA: sound/ice1712: indentation & braces disagree - add braces ALSA: usb - Add quirk for Edirol UA-25EX advanced modes sound: struct device - replace bus_id with dev_name(), dev_set_name() ALSA: hda - Add reboot notifier ALSA: Warn when control names are truncated ALSA: intel8x0 - add Dell Optiplex GX620 (AD1981B) to AC97 clock whitelist ALSA: hda - Fix SPDIF mute on IDT/STAC codecs ALSA: hda: Add HDA vendor ID for Wolfson Microelectronics ALSA: hda - Add another HP model for AD1884A
2008-10-30spi: fix compile errorFernando Luis Vazquez Cao
Fix compile error below: LD drivers/spi/built-in.o CC [M] drivers/spi/spi_gpio.o In file included from drivers/spi/spi_gpio.c:26: include/linux/spi/spi_bitbang.h:23: error: field `work' has incomplete type make[2]: *** [drivers/spi/spi_gpio.o] Error 1 make[1]: *** [drivers/spi] Error 2 make: *** [drivers] Error 2 Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30nfsd: fix vm overcommit crashAlan Cox
Junjiro R. Okajima reported a problem where knfsd crashes if you are using it to export shmemfs objects and run strict overcommit. In this situation the current->mm based modifier to the overcommit goes through a NULL pointer. We could simply check for NULL and skip the modifier but we've caught other real bugs in the past from mm being NULL here - cases where we did need a valid mm set up (eg the exec bug about a year ago). To preserve the checks and get the logic we want shuffle the checking around and add a new helper to the vm_ security wrappers Also fix a current->mm reference in nommu that should use the passed mm [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix build] Reported-by: Junjiro R. Okajima <hooanon05@yahoo.co.jp> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30kernel.h: fix might_sleep kernel-docRandy Dunlap
Put the kernel-doc for might_sleep() _immediately_ before the macro (no intervening lines). Otherwise kernel-doc complains like so: Warning(linux-2.6.27-rc3-git2//include/linux/kernel.h:129): No description found for parameter 'file' Warning(linux-2.6.27-rc3-git2//include/linux/kernel.h:129): No description found for parameter 'line' because kernel-doc is looking at the wrong function prototype (i.e., __might_sleep). [Yes, I have a todo note to myself to check/warn for that inconsistency in scripts/kernel-doc.] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: <Uwe.Kleine-Koenig@digi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30fs: remove prepare_write/commit_writeNick Piggin
Nothing uses prepare_write or commit_write. Remove them from the tree completely. [akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30cgroups: tiny cleanupsLi Zefan
- remove 'private' field from struct subsys - remove cgroup_init_smp() Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30freezer_cg: use thaw_process() in unfreeze_cgroup()Li Zefan
Don't duplicate the implementation of thaw_process(). [akpm@linux-foundation.org: make __thaw_process() static] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Acked-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30mm: increase the default mlock limit from 32k to 64kKurt Garloff
By default, non-privileged tasks can only mlock() a small amount of memory to avoid a DoS attack by ordinary users. The Linux kernel defaulted to 32k (on a 4k page size system) to accommodate the needs of gpg. However, newer gpg2 needs 64k in various circumstances and otherwise fails miserably, see bnc#329675. Change the default to 64k, and make it more agnostic to PAGE_SIZE. Signed-off-by: Kurt Garloff <garloff@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30Merge branches 'topic/fix/misc' and 'topic/fix/asoc' into for-linusTakashi Iwai
2008-10-30ALSA: ASoC: Fix mono controls after conversion to support full int masksMark Brown
When ASoC was converted to support full int width masks SOC_SINGLE_VALUE() omitted the assignment of rshift, causing the control operatins to report some mono controls as stereo. This happened to work some of the time due to a confusion between shift and min in snd_soc_info_volsw(). Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-10-29adjust init section definitionsJan Beulich
Add rodata equivalents for assembly use, and fix the section attributes used by __REFCONST. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-10-29net: replace %p6 with %pI6Harvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29udp: introduce sk_for_each_rcu_safenext()Eric Dumazet
Corey Minyard found a race added in commit 271b72c7fa82c2c7a795bc16896149933110672d (udp: RCU handling for Unicast packets.) "If the socket is moved from one list to another list in-between the time the hash is calculated and the next field is accessed, and the socket has moved to the end of the new list, the traversal will not complete properly on the list it should have, since the socket will be on the end of the new list and there's not a way to tell it's on a new list and restart the list traversal. I think that this can be solved by pre-fetching the "next" field (with proper barriers) before checking the hash." This patch corrects this problem, introducing a new sk_for_each_rcu_safenext() macro. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29udp: RCU handling for Unicast packets.Eric Dumazet
Goals are : 1) Optimizing handling of incoming Unicast UDP frames, so that no memory writes should happen in the fast path. Note: Multicasts and broadcasts still will need to take a lock, because doing a full lockless lookup in this case is difficult. 2) No expensive operations in the socket bind/unhash phases : - No expensive synchronize_rcu() calls. - No added rcu_head in socket structure, increasing memory needs, but more important, forcing us to use call_rcu() calls, that have the bad property of making sockets structure cold. (rcu grace period between socket freeing and its potential reuse make this socket being cold in CPU cache). David did a previous patch using call_rcu() and noticed a 20% impact on TCP connection rates. Quoting Cristopher Lameter : "Right. That results in cacheline cooldown. You'd want to recycle the object as they are cache hot on a per cpu basis. That is screwed up by the delayed regular rcu processing. We have seen multiple regressions due to cacheline cooldown. The only choice in cacheline hot sensitive areas is to deal with the complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU." - Because udp sockets are allocated from dedicated kmem_cache, use of SLAB_DESTROY_BY_RCU can help here. Theory of operation : --------------------- As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()), special attention must be taken by readers and writers. Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed, reused, inserted in a different chain or in worst case in the same chain while readers could do lookups in the same time. In order to avoid loops, a reader must check each socket found in a chain really belongs to the chain the reader was traversing. If it finds a mismatch, lookup must start again at the begining. This *restart* loop is the reason we had to use rdlock for the multicast case, because we dont want to send same message several times to the same socket. We use RCU only for fast path. Thus, /proc/net/udp still takes spinlocks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>