summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2013-11-14Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO core updates from Jens Axboe: "This is the pull request for the core changes in the block layer for 3.13. It contains: - The new blk-mq request interface. This is a new and more scalable queueing model that marries the best part of the request based interface we currently have (which is fully featured, but scales poorly) and the bio based "interface" which the new drivers for high IOPS devices end up using because it's much faster than the request based one. The bio interface has no block layer support, since it taps into the stack much earlier. This means that drivers end up having to implement a lot of functionality on their own, like tagging, timeout handling, requeue, etc. The blk-mq interface provides all these. Some drivers even provide a switch to select bio or rq and has code to handle both, since things like merging only works in the rq model and hence is faster for some workloads. This is a huge mess. Conversion of these drivers nets us a substantial code reduction. Initial results on converting SCSI to this model even shows an 8x improvement on single queue devices. So while the model was intended to work on the newer multiqueue devices, it has substantial improvements for "classic" hardware as well. This code has gone through extensive testing and development, it's now ready to go. A pull request is coming to convert virtio-blk to this model will be will be coming as well, with more drivers scheduled for 3.14 conversion. - Two blktrace fixes from Jan and Chen Gang. - A plug merge fix from Alireza Haghdoost. - Conversion of __get_cpu_var() from Christoph Lameter. - Fix for sector_div() with 64-bit divider from Geert Uytterhoeven. - A fix for a race between request completion and the timeout handling from Jeff Moyer. This is what caused the merge conflict with blk-mq/core, in case you are looking at that. - A dm stacking fix from Mike Snitzer. - A code consolidation fix and duplicated code removal from Kent Overstreet. - A handful of block bug fixes from Mikulas Patocka, fixing a loop crash and memory corruption on blk cg. - Elevator switch bug fix from Tomoki Sekiyama. A heads-up that I had to rebase this branch. Initially the immutable bio_vecs had been queued up for inclusion, but a week later, it became clear that it wasn't fully cooked yet. So the decision was made to pull this out and postpone it until 3.14. It was a straight forward rebase, just pruning out the immutable series and the later fixes of problems with it. The rest of the patches applied directly and no further changes were made" * 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits) block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO block: Do not call sector_div() with a 64-bit divisor kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup() block: Consolidate duplicated bio_trim() implementations block: Use rw_copy_check_uvector() block: Enable sysfs nomerge control for I/O requests in the plug list block: properly stack underlying max_segment_size to DM device elevator: acquire q->sysfs_lock in elevator_change() elevator: Fix a race in elevator switching and md device initialization block: Replace __get_cpu_var uses bdi: test bdi_init failure block: fix a probe argument to blk_register_region loop: fix crash if blk_alloc_queue fails blk-core: Fix memory corruption if blkcg_init_queue fails block: fix race between request completion and timeout handling blktrace: Send BLK_TN_PROCESS events to all running traces blk-mq: don't disallow request merges for req->special being set blk-mq: mq plug list breakage blk-mq: fix for flush deadlock ...
2013-11-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) The addition of nftables. No longer will we need protocol aware firewall filtering modules, it can all live in userspace. At the core of nftables is a, for lack of a better term, virtual machine that executes byte codes to inspect packet or metadata (arriving interface index, etc.) and make verdict decisions. Besides support for loading packet contents and comparing them, the interpreter supports lookups in various datastructures as fundamental operations. For example sets are supports, and therefore one could create a set of whitelist IP address entries which have ACCEPT verdicts attached to them, and use the appropriate byte codes to do such lookups. Since the interpreted code is composed in userspace, userspace can do things like optimize things before giving it to the kernel. Another major improvement is the capability of atomically updating portions of the ruleset. In the existing netfilter implementation, one has to update the entire rule set in order to make a change and this is very expensive. Userspace tools exist to create nftables rules using existing netfilter rule sets, but both kernel implementations will need to co-exist for quite some time as we transition from the old to the new stuff. Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have worked so hard on this. 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements to our pseudo-random number generator, mostly used for things like UDP port randomization and netfitler, amongst other things. In particular the taus88 generater is updated to taus113, and test cases are added. 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet and Yang Yingliang. 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin Sujir. 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet, Neal Cardwell, and Yuchung Cheng. 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary control message data, much like other socket option attributes. From Francesco Fusco. 7) Allow applications to specify a cap on the rate computed automatically by the kernel for pacing flows, via a new SO_MAX_PACING_RATE socket option. From Eric Dumazet. 8) Make the initial autotuned send buffer sizing in TCP more closely reflect actual needs, from Eric Dumazet. 9) Currently early socket demux only happens for TCP sockets, but we can do it for connected UDP sockets too. Implementation from Shawn Bohrer. 10) Refactor inet socket demux with the goal of improving hash demux performance for listening sockets. With the main goals being able to use RCU lookups on even request sockets, and eliminating the listening lock contention. From Eric Dumazet. 11) The bonding layer has many demuxes in it's fast path, and an RCU conversion was started back in 3.11, several changes here extend the RCU usage to even more locations. From Ding Tianhong and Wang Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav Falico. 12) Allow stackability of segmentation offloads to, in particular, allow segmentation offloading over tunnels. From Eric Dumazet. 13) Significantly improve the handling of secret keys we input into the various hash functions in the inet hashtables, TCP fast open, as well as syncookies. From Hannes Frederic Sowa. The key fundamental operation is "net_get_random_once()" which uses static keys. Hannes even extended this to ipv4/ipv6 fragmentation handling and our generic flow dissector. 14) The generic driver layer takes care now to set the driver data to NULL on device removal, so it's no longer necessary for drivers to explicitly set it to NULL any more. Many drivers have been cleaned up in this way, from Jingoo Han. 15) Add a BPF based packet scheduler classifier, from Daniel Borkmann. 16) Improve CRC32 interfaces and generic SKB checksum iterators so that SCTP's checksumming can more cleanly be handled. Also from Daniel Borkmann. 17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces using the interface MTU value. This helps avoid PMTU attacks, particularly on DNS servers. From Hannes Frederic Sowa. 18) Use generic XPS for transmit queue steering rather than internal (re-)implementation in virtio-net. From Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) random32: add test cases for taus113 implementation random32: upgrade taus88 generator to taus113 from errata paper random32: move rnd_state to linux/random.h random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized random32: add periodic reseeding random32: fix off-by-one in seeding requirement PHY: Add RTL8201CP phy_driver to realtek xtsonic: add missing platform_set_drvdata() in xtsonic_probe() macmace: add missing platform_set_drvdata() in mace_probe() ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe() ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh vlan: Implement vlan_dev_get_egress_qos_mask as an inline. ixgbe: add warning when max_vfs is out of range. igb: Update link modes display in ethtool netfilter: push reasm skb through instead of original frag skbs ip6_output: fragment outgoing reassembled skb properly MAINTAINERS: mv643xx_eth: take over maintainership from Lennart net_sched: tbf: support of 64bit rates ixgbe: deleting dfwd stations out of order can cause null ptr deref ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS ...
2013-11-13lib/genalloc: add a helper function for DMA buffer allocationNicolin Chen
When using pool space for DMA buffer, there might be duplicated calling of gen_pool_alloc() and gen_pool_virt_to_phys() in each implementation. Thus it's better to add a simple helper function, a compatible one to the common dma_alloc_coherent(), to save some code. Signed-off-by: Nicolin Chen <b42378@freescale.com> Cc: "Hans J. Koch" <hjk@hansjkoch.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Eric Miao <eric.y.miao@gmail.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Kevin Hilman <khilman@deeprootsystems.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Sekhar Nori <nsekhar@ti.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13lib/digsig.c: use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))Duan Jiong
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13lib/vsprintf.c: document formats for dentry and struct fileOlof Johansson
Looks like these were added to Documentation/printk-formats.txt but not the in-file table. Signed-off-by: Olof Johansson <olof@lixom.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13lib/debugobjects.c: remove unnecessary work pending testXie XiuQi
Remove unnecessary work pending test before calling schedule_work(). It has been tested in queue_work_on() already. No functional changed. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13vsprintf: check real user/group id for %pKRyan Mallon
Some setuid binaries will allow reading of files which have read permission by the real user id. This is problematic with files which use %pK because the file access permission is checked at open() time, but the kptr_restrict setting is checked at read() time. If a setuid binary opens a %pK file as an unprivileged user, and then elevates permissions before reading the file, then kernel pointer values may be leaked. This happens for example with the setuid pppd application on Ubuntu 12.04: $ head -1 /proc/kallsyms 00000000 T startup_32 $ pppd file /proc/kallsyms pppd: In file /proc/kallsyms: unrecognized option 'c1000000' This will only leak the pointer value from the first line, but other setuid binaries may leak more information. Fix this by adding a check that in addition to the current process having CAP_SYSLOG, that effective user and group ids are equal to the real ids. If a setuid binary reads the contents of a file which uses %pK then the pointer values will be printed as NULL if the real user is unprivileged. Update the sysctl documentation to reflect the changes, and also correct the documentation to state the kptr_restrict=0 is the default. This is a only temporary solution to the issue. The correct solution is to do the permission check at open() time on files, and to replace %pK with a function which checks the open() time permission. %pK uses in printk should be removed since no sane permission check can be done, and instead protected by using dmesg_restrict. Signed-off-by: Ryan Mallon <rmallon@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13percpu: add test module for various percpu operationsGreg Thelen
Tests various percpu operations. Enable with CONFIG_PERCPU_TEST=m. Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13mm: do not walk all of system memory during show_memMel Gorman
It has been reported on very large machines that show_mem is taking almost 5 minutes to display information. This is a serious problem if there is an OOM storm. The bulk of the cost is in show_mem doing a very expensive PFN walk to give us the following information Total RAM: Also available as totalram_pages Highmem pages: Also available as totalhigh_pages Reserved pages: Can be inferred from the zone structure Shared pages: PFN walk required Unshared pages: PFN walk required Quick pages: Per-cpu walk required Only the shared/unshared pages requires a full PFN walk but that information is useless. It is also inaccurate as page pins of unshared pages would be accounted for as shared. Even if the information was accurate, I'm struggling to think how the shared/unshared information could be useful for debugging OOM conditions. Maybe it was useful before rmap existed when reclaiming shared pages was costly but it is less relevant today. The PFN walk could be optimised a bit but why bother as the information is useless. This patch deletes the PFN walker and infers the total RAM, highmem and reserved pages count from struct zone. It omits the shared/unshared page usage on the grounds that it is useless. It also corrects the reporting of HighMem as HighMem/MovableOnly as ZONE_MOVABLE has similar problems to HighMem with respect to lowmem/highmem exhaustion. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-12Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle are: - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik van Riel, Peter Zijlstra et al. Yay! - optimize preemption counter handling: merge the NEED_RESCHED flag into the preempt_count variable, by Peter Zijlstra. - wait.h fixes and code reorganization from Peter Zijlstra - cfs_bandwidth fixes from Ben Segall - SMP load-balancer cleanups from Peter Zijstra - idle balancer improvements from Jason Low - other fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits) ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED stop_machine: Fix race between stop_two_cpus() and stop_cpus() sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus sched: Fix asymmetric scheduling for POWER7 sched: Move completion code from core.c to completion.c sched: Move wait code from core.c to wait.c sched: Move wait.c into kernel/sched/ sched/wait: Fix __wait_event_interruptible_lock_irq_timeout() sched: Avoid throttle_cfs_rq() racing with period_timer stopping sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used sched: Remove extra put_online_cpus() inside sched_setaffinity() sched/rt: Fix task_tick_rt() comment sched/wait: Fix build breakage sched/wait: Introduce prepare_to_wait_event() sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too sched: Remove get_online_cpus() usage sched: Fix race in migrate_swap_stop() ...
2013-11-11random32: add test cases for taus113 implementationDaniel Borkmann
We generated a battery of 100 test cases from GSL taus113 implemention and compare the results from a particular seed and a particular iteration with our implementation in the kernel. We have verified on 32 and 64 bit machines that our taus113 kernel implementation gives same results as GSL taus113 implementation: [ 0.147370] prandom: seed boundary self test passed [ 0.148078] prandom: 100 self tests passed This is a Kconfig option that is disabled on default, just like the crc32 init selftests in order to not unnecessary slow down boot process. We also refactored out prandom_seed_very_weak() as it's now used in multiple places in order to reduce redundant code. GSL code we used for generating test cases: int i, j; srand(time(NULL)); for (i = 0; i < 100; ++i) { int iteration = 500 + (rand() % 500); gsl_rng_default_seed = rand() + 1; gsl_rng *r = gsl_rng_alloc(gsl_rng_taus113); printf("\t{ %lu, ", gsl_rng_default_seed); for (j = 0; j < iteration - 1; ++j) gsl_rng_get(r); printf("%u, %lu },\n", iteration, gsl_rng_get(r)); gsl_rng_free(r); } Joint work with Hannes Frederic Sowa. Cc: Florian Weimer <fweimer@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11random32: upgrade taus88 generator to taus113 from errata paperDaniel Borkmann
Since we use prandom*() functions quite often in networking code i.e. in UDP port selection, netfilter code, etc, upgrade the PRNG from Pierre L'Ecuyer's original paper "Maximally Equidistributed Combined Tausworthe Generators", Mathematics of Computation, 65, 213 (1996), 203--213 to the version published in his errata paper [1]. The Tausworthe generator is a maximally-equidistributed generator, that is fast and has good statistical properties [1]. The version presented there upgrades the 3 state LFSR to a 4 state LFSR with increased periodicity from about 2^88 to 2^113. The algorithm is presented in [1] by the very same author who also designed the original algorithm in [2]. Also, by increasing the state, we make it a bit harder for attackers to "guess" the PRNGs internal state. See also discussion in [3]. Now, as we use this sort of weak initialization discussed in [3] only between core_initcall() until late_initcall() time [*] for prandom32*() users, namely in prandom_init(), it is less relevant from late_initcall() onwards as we overwrite seeds through prandom_reseed() anyways with a seed source of higher entropy, that is, get_random_bytes(). In other words, a exhaustive keysearch of 96 bit would be needed. Now, with the help of this patch, this state-search increases further to 128 bit. Initialization needs to make sure that s1 > 1, s2 > 7, s3 > 15, s4 > 127. taus88 and taus113 algorithm is also part of GSL. I added a test case in the next patch to verify internal behaviour of this patch with GSL and ran tests with the dieharder 3.31.1 RNG test suite: $ dieharder -g 052 -a -m 10 -s 1 -S 4137730333 #taus88 $ dieharder -g 054 -a -m 10 -s 1 -S 4137730333 #taus113 With this seed configuration, in order to compare both, we get the following differences: algorithm taus88 taus113 rands/second [**] 1.61e+08 1.37e+08 sts_serial(4, 1st run) WEAK PASSED sts_serial(9, 2nd run) WEAK PASSED rgb_lagged_sum(31) WEAK PASSED We took out diehard_sums test as according to the authors it is considered broken and unusable [4]. Despite that and the slight decrease in performance (which is acceptable), taus113 here passes all 113 tests (only rgb_minimum_distance_5 in WEAK, the rest PASSED). In general, taus/taus113 is considered "very good" by the authors of dieharder [5]. The papers [1][2] states a single warm-up step is sufficient by running quicktaus once on each state to ensure proper initialization of ~s_{0}: Our selection of (s) according to Table 1 of [1] row 1 holds the condition L - k <= r - s, that is, (32 32 32 32) - (31 29 28 25) <= (25 27 15 22) - (18 2 7 13) with r = k - q and q = (6 2 13 3) as also stated by the paper. So according to [2] we are safe with one round of quicktaus for initialization. However we decided to include the warm-up phase of the PRNG as done in GSL in every case as a safety net. We also use the warm up phase to make the output of the RNG easier to verify by the GSL output. In prandom_init(), we also mix random_get_entropy() into it, just like drivers/char/random.c does it, jiffies ^ random_get_entropy(). random-get_entropy() is get_cycles(). xor is entropy preserving so it is fine if it is not implemented by some architectures. Note, this PRNG is *not* used for cryptography in the kernel, but rather as a fast PRNG for various randomizations i.e. in the networking code, or elsewhere for debugging purposes, for example. [*]: In order to generate some "sort of pseduo-randomness", since get_random_bytes() is not yet available for us, we use jiffies and initialize states s1 - s3 with a simple linear congruential generator (LCG), that is x <- x * 69069; and derive s2, s3, from the 32bit initialization from s1. So the above quote from [3] accounts only for the time from core to late initcall, not afterwards. [**] Single threaded run on MacBook Air w/ Intel Core i5-3317U [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps [3] http://thread.gmane.org/gmane.comp.encryption.general/12103/ [4] http://code.google.com/p/dieharder/source/browse/trunk/libdieharder/diehard_sums.c?spec=svn490&r=490#20 [5] http://www.phy.duke.edu/~rgb/General/dieharder.php Joint work with Hannes Frederic Sowa. Cc: Florian Weimer <fweimer@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11random32: add prandom_reseed_late() and call when nonblocking pool becomes ↵Hannes Frederic Sowa
initialized The Tausworthe PRNG is initialized at late_initcall time. At that time the entropy pool serving get_random_bytes is not filled sufficiently. This patch adds an additional reseeding step as soon as the nonblocking pool gets marked as initialized. On some machines it might be possible that late_initcall gets called after the pool has been initialized. In this situation we won't reseed again. (A call to prandom_seed_late blocks later invocations of early reseed attempts.) Joint work with Daniel Borkmann. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11random32: add periodic reseedingHannes Frederic Sowa
The current Tausworthe PRNG is never reseeded with truly random data after the first attempt in late_initcall. As this PRNG is used for some critical random data as e.g. UDP port randomization we should try better and reseed the PRNG once in a while with truly random data from get_random_bytes(). When we reseed with prandom_seed we now make also sure to throw the first output away. This suffices the reseeding procedure. The delay calculation is based on a proposal from Eric Dumazet. Joint work with Daniel Borkmann. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11random32: fix off-by-one in seeding requirementDaniel Borkmann
For properly initialising the Tausworthe generator [1], we have a strict seeding requirement, that is, s1 > 1, s2 > 7, s3 > 15. Commit 697f8d0348 ("random32: seeding improvement") introduced a __seed() function that imposes boundary checks proposed by the errata paper [2] to properly ensure above conditions. However, we're off by one, as the function is implemented as: "return (x < m) ? x + m : x;", and called with __seed(X, 1), __seed(X, 7), __seed(X, 15). Thus, an unwanted seed of 1, 7, 15 would be possible, whereas the lower boundary should actually be of at least 2, 8, 16, just as GSL does. Fix this, as otherwise an initialization with an unwanted seed could have the effect that Tausworthe's PRNG properties cannot not be ensured. Note that this PRNG is *not* used for cryptography in the kernel. [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps Joint work with Hannes Frederic Sowa. Fixes: 697f8d0348a6 ("random32: seeding improvement") Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11Merge tag 'gfs2-merge-window' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull gfs2 updates from Steven Whitehouse: "The main feature of interest this time is quota updates. There are some clean ups and some patches to use the new generic lru list code. There is still plenty of scope for some further changes in due course - faster lookups of quota structures is very much on the todo list. Also, a start has been made towards the more tricky issue of using the generic lru code with glocks, but that will have to be completed in a subsequent merge window. The other, more minor feature, is that there have been a number of performance patches which relate to block allocation. In particular they will improve performance when the disk is nearly full" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Use generic list_lru for quota GFS2: Rename quota qd_lru_lock qd_lock GFS2: Use reflink for quota data cache GFS2: Use lockref for glocks GFS2: Protect quota sync generation GFS2: Inline qd_trylock into gfs2_quota_unlock GFS2: Make two similar quota code fragments into a function GFS2: Remove obsolete quota tunable GFS2: Move gfs2_icbit_munge into quota.c GFS2: Speed up starting point selection for block allocation GFS2: Add allocation parameters structure GFS2: Clean up reservation removal GFS2: fix dentry leaks GFS2: new function gfs2_rbm_incr GFS2: Introduce rbm field bii GFS2: Do not reset flags on active reservations GFS2: introduce bi_blocks for optimization GFS2: optimize rbm_from_block wrt bi_start GFS2: d_splice_alias() can't return error
2013-11-08Merge branch 'blk-mq/core' into for-3.13/coreJens Axboe
Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-timeout.c
2013-11-07Revert "sysfs: drop kobj_ns_type handling"Linus Torvalds
This reverts commit cb26a311578e67769e92a39a0a63476533cb7e12. It mysteriously causes NetworkManager to not find the wireless device for me. As far as I can tell, Tejun *meant* for this commit to not make any semantic changes, but there clearly are some. So revert it, taking into account some of the calling convention changes that happened in this area in subsequent commits. Cc: Tejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-07Merge tag 'tty-3.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here's the big tty/serial driver update for 3.13-rc1. There's some more minor n_tty work here, but nothing like previous kernel releases. Also some new driver ids, driver updates for new hardware, and other small things. All of this has been in linux-next for a while with no issues" * tag 'tty-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (84 commits) serial: omap: fix missing comma serial: sh-sci: Enable the driver on all ARM platforms serial: mfd: Staticize local symbols serial: omap: fix a few checkpatch warnings serial: omap: improve RS-485 performance mrst_max3110: fix unbalanced IRQ issue during resume serial: omap: Add support for optional wake-up serial: sirf: remove duplicate defines tty: xuartps: Fix build error when COMMON_CLK is not set tty: xuartps: Fix build error due to missing forward declaration tty: xuartps: Fix "may be used uninitialized" build warning serial: 8250_pci: add Pericom PCIe Serial board Support (12d8:7952/4/8) - Chip PI7C9X7952/4/8 tty: xuartps: Update copyright information tty: xuartps: Implement suspend/resume callbacks tty: xuartps: Dynamically adjust to input frequency changes tty: xuartps: Updating set_baud_rate() tty: xuartps: Force enable the UART in xuartps_console_write tty: xuartps: support 64 byte FIFO size tty: xuartps: Add polled mode support for xuartps tty: xuartps: Implement BREAK detection, add SYSRQ support ...
2013-11-07Merge tag 'driver-core-3.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / sysfs patches from Greg KH: "Here's the big driver core / sysfs update for 3.13-rc1. There's lots of dev_groups updates for different subsystems, as they all get slowly migrated over to the safe versions of the attribute groups (removing userspace races with the creation of the sysfs files.) Also in here are some kobject updates, devres expansions, and the first round of Tejun's sysfs reworking to enable it to be used by other subsystems as a backend for an in-kernel filesystem. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (83 commits) sysfs: rename sysfs_assoc_lock and explain what it's about sysfs: use generic_file_llseek() for sysfs_file_operations sysfs: return correct error code on unimplemented mmap() mdio_bus: convert bus code to use dev_groups device: Make dev_WARN/dev_WARN_ONCE print device as well as driver name sysfs: separate out dup filename warning into a separate function sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c sysfs: remove unused sysfs_get_dentry() prototype sysfs: honor bin_attr.attr.ignore_lockdep sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr devres: restore zeroing behavior of devres_alloc() sysfs: fix sysfs_write_file for bin file input: gameport: convert bus code to use dev_groups input: serio: remove bus usage of dev_attrs input: serio: use DEVICE_ATTR_RO() i2o: convert bus code to use dev_groups memstick: convert bus code to use dev_groups tifm: convert bus code to use dev_groups virtio: convert bus code to use dev_groups ipack: convert bus code to use dev_groups ...
2013-11-04lib: crc32: reduce number of cases for crc32{, c}_combineDaniel Borkmann
We can safely reduce the number of test cases by a tenth. There is no particular need to run as many as we're running now for crc32{,c}_combine, that gives us still ~8000 tests we're doing if people run kernels with crc selftests enabled which is perfectly fine. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04lib: crc32: conditionally resched when running testcasesDaniel Borkmann
Fengguang reports that when crc32 selftests are running on startup, on some e.g. 32bit systems, we can get a CPU stall like "INFO: rcu_sched self-detected stall on CPU { 0} (t=2101 jiffies g=4294967081 c=4294967080 q=41)". As this is not intended, add a cond_resched() at the end of a test case to fix it. Introduced by efba721f63 ("lib: crc32: add test cases for crc32{, c}_combine routines"). Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03lib: crc32: add test cases for crc32{, c}_combine routinesDaniel Borkmann
We already have 100 test cases for crcs itself, so split the test buffer with a-prio known checksums, and test crc of two blocks against crc of the whole block for the same results. Output/result with CONFIG_CRC32_SELFTEST=y: [ 2.687095] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64 [ 2.687097] crc32: self tests passed, processed 225944 bytes in 278177 nsec [ 2.687383] crc32c: CRC_LE_BITS = 64 [ 2.687385] crc32c: self tests passed, processed 225944 bytes in 141708 nsec [ 7.336771] crc32_combine: 113072 self tests passed [ 12.050479] crc32c_combine: 113072 self tests passed [ 17.633089] alg: No test for crc32 (crc32-pclmul) Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03lib: crc32: add functionality to combine two crc32{, c}s in GF(2)Daniel Borkmann
This patch adds a combinator to merge two or more crc32{,c}s into a new one. This is useful for checksum computations of fragmented skbs that use crc32/crc32c as checksums. The arithmetics for combining both in the GF(2) was taken and slightly modified from zlib. Only passing two crcs is insufficient as two crcs and the length of the second piece is needed for merging. The code is made generic, so that only polynomials need to be passed for crc32_le resp. crc32c_le. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03lib: crc32: clean up spacing in test casesDaniel Borkmann
This is nothing more but a whitepace cleanup, as 80 chars is not a hard but soft limit, and otherwise makes the test cases array really look ugly. So fix it up. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-01Merge branch 'linus' into sched/coreIngo Molnar
Resolve cherry-picking conflicts: Conflicts: mm/huge_memory.c mm/memory.c mm/mprotect.c See this upstream merge commit for more details: 52469b4fcd4f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-31lib/scatterlist.c: don't flush_kernel_dcache_page on slab pageMing Lei
Commit b1adaf65ba03 ("[SCSI] block: add sg buffer copy helper functions") introduces two sg buffer copy helpers, and calls flush_kernel_dcache_page() on pages in SG list after these pages are written to. Unfortunately, the commit may introduce a potential bug: - Before sending some SCSI commands, kmalloc() buffer may be passed to block layper, so flush_kernel_dcache_page() can see a slab page finally - According to cachetlb.txt, flush_kernel_dcache_page() is only called on "a user page", which surely can't be a slab page. - ARCH's implementation of flush_kernel_dcache_page() may use page mapping information to do optimization so page_mapping() will see the slab page, then VM_BUG_ON() is triggered. Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled, and this patch fixes the bug by adding test of '!PageSlab(miter->page)' before calling flush_kernel_dcache_page(). Signed-off-by: Ming Lei <ming.lei@canonical.com> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Tested-by: Simon Baatz <gmbnomis@gmail.com> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> [3.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-29Kconfig: make KOBJECT_RELEASE debugging require timer debuggingLinus Torvalds
Without the timer debugging, the delayed kobject release will just result in undebuggable oopses if it triggers any latent bugs. That doesn't actually help debugging at all. So make DEBUG_KOBJECT_RELEASE depend on DEBUG_OBJECTS_TIMERS to avoid having people enable one without the other. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-25percpu_ida: add an API to return free tagsShaohua Li
Add an API to return free tags, blk-mq-tag will use it. Note, this just returns a snapshot of free tags number. blk-mq-tag has two usages of it. One is for info output for diagnosis. The other is to quickly check if there are free tags for request dispatch checking. Neither requires very precise. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25percpu_ida: add percpu_ida_for_each_freeShaohua Li
Add a new API to iterate free ids. blk-mq-tag will use it. Note, this doesn't guarantee to iterate all free ids restrictly. Caller should be aware of this. blk-mq uses it to do sanity check for request timedout, so can tolerate the limitation. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25percpu_ida: make percpu_ida percpu size/batch configurableShaohua Li
Make percpu_ida percpu size/batch configurable. The block-mq-tag will use it. After block-mq uses percpu_ida to manage tags, performance is improved. My test is done in a 2 sockets machine, 12 process cross the 2 sockets. So if there is lock contention or ipi, should be stressed heavily. Testing is done for null-blk. hw_queue_depth nopatch iops patch iops 64 ~800k/s ~1470k/s 2048 ~4470k/s ~4340k/s Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25percpu_counter: make APIs irq safeShaohua Li
In my usage, sometimes the percpu APIs are called with irq locked, sometimes not. lockdep complains there is potential deadlock. Let's always use percpucounter lock in irq safe way. There should be no performance penality, as all those are slow code path. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-19Merge 3.12-rc6 into driver-core-nextGreg Kroah-Hartman
We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-16percpu_refcount: export symbolsMatias Bjorling
Export the interface to be used within modules. Signed-off-by: Matias Bjorling <m@bjorling.me> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16sysrq: Allow magic SysRq key functions to be disabled through KconfigBen Hutchings
Turn the initial value of sysctl kernel.sysrq (SYSRQ_DEFAULT_ENABLE) into a Kconfig variable. Original version by Bastian Blank <waldi@debian.org>. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-15GFS2: Use lockref for glocksSteven Whitehouse
Currently glocks have an atomic reference count and also a spinlock which covers various internal fields, such as the state. This intent of this patch is to replace the spinlock and the atomic reference count with a lockref structure. This contains a spinlock which we can continue to use as before, and a reference counter which is used in conjuction with the spinlock to replace the previous atomic counter. As a result of this there are some new rules for reference counting on glocks. We need to distinguish between reference count changes under gl_spin (which are now just increment or decrement of the new counter, provided the count cannot hit zero) and those which are outside of gl_spin, but which now take gl_spin internally. The conversion is relatively straight forward. There is probably some further clean up which can be done, but the priority at this stage is to make the change in as simple a manner as possible. A consequence of this change is that the reference count is being decoupled from the lru list processing. This should allow future adoption of the lru_list code with glocks in due course. The reason for using the "dead" state and not just relying on 0 being the "invalid state" is so that in due course 0 ref counts can be allowable. The intent is to eventually be able to remove the ref count changes which are currently hidden away in state_change(). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-11kobject: show debug info on delayed kobject releaseFengguang Wu
Useful for locating buggy drivers on kernel oops. It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is hopefully only enabled in debug kernels (like maybe the Fedora rawhide one, or at developers), so being a bit more verbose is likely ok. Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-11Merge branch 'core/urgent' into sched/coreIngo Molnar
Merge in asm goto fix, to be able to apply the asm/rmwcc.h fix. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-10kobject: show debug info on delayed kobject releaseFengguang Wu
Useful for locating buggy drivers on kernel oops. It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is hopefully only enabled in debug kernels (like maybe the Fedora rawhide one, or at developers), so being a bit more verbose is likely ok. Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-09Merge tag 'v3.12-rc4' into sched/coreIngo Molnar
Merge Linux v3.12-rc4 to fix a conflict and also to refresh the tree before applying more scheduler patches. Conflicts: arch/avr32/include/asm/Kbuild Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-03kobject: grab an extra reference on kobject->sd to allow duplicate deletesTejun Heo
sysfs currently has a rather weird behavior regarding removals. A directory removal would delete all files directly under it but wouldn't recurse into subdirectories, which, while a bit inconsistent, seems to make sense at the first glance as each directory is supposedly associated with a kobject and each kobject can take care of the directory deletion; however, this doesn't really hold as we have groups which can be directories without a kobject associated with it and require explicit deletions. We're in the process of separating out sysfs from kboject / driver core and want a consistent behavior. A removal should delete either only the specified node or everything under it. I think it is helpful to support recursive atomic removal and later patches will implement it. Such change means that a sysfs_dirent associated with kobject may be deleted before the kobject itself is removed if one of its ancestor gets removed before it. As sysfs_remove_dir() puts the base ref, we may end up with dangling pointer on descendants. This can be solved by holding an extra reference on the sd from kobject. Acquire an extra reference on the associated sysfs_dirent on directory creation and put it after removal. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking changes from David Miller: 1) Multiply in netfilter IPVS can overflow when calculating destination weight. From Simon Kirby. 2) Use after free fixes in IPVS from Julian Anastasov. 3) SFC driver bug fixes from Daniel Pieczko. 4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov. 5) Locking and encapsulation fixes to serial line CAN driver, from Andrew Naujoks. 6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner, Eilon Greenstein, and Ariel Elior. 7) In lapb, if no other packets are outstanding, T1 timeouts actually stall things and no packet gets sent. Fix from Josselin Costanzi. 8) ICMP redirects should not make it to the socket error queues, from Duan Jiong. 9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka. 10) Fix setting of VLAN priority field on via-rhine driver, from Roget Luethi. 11) Fix TX stalls and VLAN promisc programming in be2net driver from Ajit Khaparde. 12) Packet padding doesn't get handled correctly in new usbnet SG support code, from Ming Lei. 13) Fix races in netdevice teardown wrt. network namespace closing. From Eric W. Biederman. 14) Fix potential missed initialization of net_secret if not TCP connections are openned. From Eric Dumazet. 15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from Aleksander Morgado. 16) skb_cow_head() can change skb->data and thus packet header pointers, don't use stale ip_hdr reference in ip_tunnel code. 17) Backend state transition handling fixes in xen-netback, from Paul Durrant. 18) Packet offset for AH protocol is handled wrong in flow dissector, from Eric Dumazet. 19) Taking down an fq packet scheduler instance can leave stale packets in the queues, fix from Eric Dumazet. 20) Fix performance regressions introduced by TCP Small Queues. From Eric Dumazet. 21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from Hannes Frederic Sowa. 22) Multicast timer handlers in ipv4 and ipv6 can be the last and final reference to the ipv4/ipv6 specific network device state, so use the reference put that will check and release the object if the reference hits zero. From Salam Noureddine. 23) Fix memory corruption in ip_tunnel driver, and use skb_push() instead of __skb_push() so that similar bugs are less hard to find. From Steffen Klassert. 24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from Nicolas Dichtel. 25) fq scheduler doesn't accurately rate limit in certain circumstances, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits) pkt_sched: fq: rate limiting improvements ip6tnl: allow to use rtnl ops on fb tunnel sit: allow to use rtnl ops on fb tunnel ip_tunnel: Remove double unregister of the fallback device ip_tunnel_core: Change __skb_push back to skb_push ip_tunnel: Add fallback tunnels to the hash lists ip_tunnel: Fix a memory corruption in ip_tunnel_xmit qlcnic: Fix SR-IOV configuration ll_temac: Reset dma descriptors indexes on ndo_open skbuff: size of hole is wrong in a comment ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put ethernet: moxa: fix incorrect placement of __initdata tag ipv6: gre: correct calculation of max_headroom powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file" bonding: Fix broken promiscuity reference counting issue tcp: TSQ can use a dynamic limit dm9601: fix IFF_ALLMULTI handling pkt_sched: fq: qdisc dismantle fixes ...
2013-09-29Merge 3.12-rc3 into driver-core-nextGreg Kroah-Hartman
We want the driver core and sysfs fixes in here to make merges and development easier. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-28lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP()Heiko Carstens
Make use of arch_mutex_cpu_relax() so architectures can override the default cpu_relax() semantics. This is especially useful for s390, where cpu_relax() means that we yield() the current (virtual) cpu and therefore is very expensive, and would contradict the whole purpose of the lockless cmpxchg loop. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-27sysfs: Allow mounting without CONFIG_NETEric W. Biederman
In kobj_ns_current_may_mount the default should be to allow the mount. The test is only for a single kobj_ns_type at a time, and unless there is a reason to prevent it the mounting sysfs should be allowed. Subsystems that are not registered can't have are not involved so can't have a reason to prevent mounting sysfs. This is a bug-fix to commit 7dc5dbc879bd ("sysfs: Restrict mounting sysfs") that came in via the userns tree during the 3.12 merge window. Reported-and-tested-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-27lockref: allow relaxed cmpxchg64 variant for lockless updatesWill Deacon
The 64-bit cmpxchg operation on the lockref is ordered by virtue of hazarding between the cmpxchg operation and the reference count manipulation. On weakly ordered memory architectures (such as ARM), it can be of great benefit to omit the barrier instructions where they are not needed. This patch moves the lockless lockref code over to a cmpxchg64_relaxed operation, which doesn't provide barrier semantics. If the operation isn't defined, we simply #define it as the usual 64-bit cmpxchg macro. Cc: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-26kobject: introduce kobj_completionJeff Mahoney
A common way to handle kobject lifetimes in embedded in objects with different lifetime rules is to pair the kobject with a struct completion. This introduces a kobj_completion structure that can be used in place of the pairing, along with several convenience functions for initialization, release, and put-and-wait. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26sysfs: drop kobj_ns_type handlingTejun Heo
The way namespace tags are implemented in sysfs is more complicated than necessary. As each tag is a pointer value and required to be non-NULL under a namespace enabled parent, there's no need to record separately what type each tag is or where namespace is enabled. If multiple namespace types are needed, which currently aren't, we can simply compare the tag to a set of allowed tags in the superblock assuming that the tags, being pointers, won't have the same value across multiple types. Also, whether to filter by namespace tag or not can be trivially determined by whether the node has any tagged children or not. This patch rips out kobj_ns_type handling from sysfs. sysfs no longer cares whether specific type of namespace is enabled or not. If a sysfs_dirent has a non-NULL tag, the parent is marked as needing namespace filtering and the value is tested against the allowed set of tags for the superblock (currently only one but increasing this number isn't difficult) and the sysfs_dirent is ignored if it doesn't match. This removes most kobject namespace knowledge from sysfs proper which will enable proper separation and layering of sysfs. The namespace sanity checks in fs/sysfs/dir.c are replaced by the new sanity check in kobject_namespace(). As this is the only place ktype->namespace() is called for sysfs, this doesn't weaken the sanity check significantly. I omitted converting the sanity check in sysfs_do_create_link_sd(). While the check can be shifted to upper layer, mistakes there are well contained and should be easily visible anyway. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26sysfs: remove ktype->namespace() invocations in directory codeTejun Heo
For some unrecognizable reason, namespace information is communicated to sysfs through ktype->namespace() callback when there's *nothing* which needs the use of a callback. The whole sequence of operations is completely synchronous and sysfs operations simply end up calling back into the layer which just invoked it in order to find out the namespace information, which is completely backwards, obfuscates what's going on and unnecessarily tangles two separate layers. This patch doesn't remove ktype->namespace() but shifts its handling to kobject layer. We probably want to get rid of the callback in the long term. This patch adds an explicit param to sysfs_{create|rename|move}_dir() and renames them to sysfs_{create|rename|move}_dir_ns(), respectively. ktype->namespace() invocations are moved to the calling sites of the above functions. A new helper kboject_namespace() is introduced which directly tests kobj_ns_type_operations->type which should give the same result as testing sysfs_fs_type(parent_sd) and returns @kobj's namespace tag as necessary. kobject_namespace() is extern as it will be used from another file in the following patches. This patch should be an equivalent conversion without any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>