summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-08-06kernel/auditfilter.c: replace count*size kmalloc by kcallocFabian Frederick
kcalloc manages count*sizeof overflow. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06ring-buffer: Always reset iterator to reader pageSteven Rostedt (Red Hat)
When performing a consuming read, the ring buffer swaps out a page from the ring buffer with a empty page and this page that was swapped out becomes the new reader page. The reader page is owned by the reader and since it was swapped out of the ring buffer, writers do not have access to it (there's an exception to that rule, but it's out of scope for this commit). When reading the "trace" file, it is a non consuming read, which means that the data in the ring buffer will not be modified. When the trace file is opened, a ring buffer iterator is allocated and writes to the ring buffer are disabled, such that the iterator will not have issues iterating over the data. Although the ring buffer disabled writes, it does not disable other reads, or even consuming reads. If a consuming read happens, then the iterator is reset and starts reading from the beginning again. My tests would sometimes trigger this bug on my i386 box: WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa() Modules linked in: CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M Call Trace: [<c18796b3>] dump_stack+0x4b/0x75 [<c103a0e3>] warn_slowpath_common+0x7e/0x95 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa [<c103a185>] warn_slowpath_fmt+0x33/0x35 [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M [<c10bed04>] trace_find_cmdline+0x40/0x64 [<c10c3c16>] trace_print_context+0x27/0xec [<c10c4360>] ? trace_seq_printf+0x37/0x5b [<c10c0b15>] print_trace_line+0x319/0x39b [<c10ba3fb>] ? ring_buffer_read+0x47/0x50 [<c10c13b1>] s_show+0x192/0x1ab [<c10bfd9a>] ? s_next+0x5a/0x7c [<c112e76e>] seq_read+0x267/0x34c [<c1115a25>] vfs_read+0x8c/0xef [<c112e507>] ? seq_lseek+0x154/0x154 [<c1115ba2>] SyS_read+0x54/0x7f [<c188488e>] syscall_call+0x7/0xb ---[ end trace 3f507febd6b4cc83 ]--- >>>> ##### CPU 1 buffer started #### Which was the __trace_find_cmdline() function complaining about the pid in the event record being negative. After adding more test cases, this would trigger more often. Strangely enough, it would never trigger on a single test, but instead would trigger only when running all the tests. I believe that was the case because it required one of the tests to be shutting down via delayed instances while a new test started up. After spending several days debugging this, I found that it was caused by the iterator becoming corrupted. Debugging further, I found out why the iterator became corrupted. It happened with the rb_iter_reset(). As consuming reads may not read the full reader page, and only part of it, there's a "read" field to know where the last read took place. The iterator, must also start at the read position. In the rb_iter_reset() code, if the reader page was disconnected from the ring buffer, the iterator would start at the head page within the ring buffer (where writes still happen). But the mistake there was that it still used the "read" field to start the iterator on the head page, where it should always start at zero because readers never read from within the ring buffer where writes occur. I originally wrote a patch to have it set the iter->head to 0 instead of iter->head_page->read, but then I questioned why it wasn't always setting the iter to point to the reader page, as the reader page is still valid. The list_empty(reader_page->list) just means that it was successful in swapping out. But the reader_page may still have data. There was a bug report a long time ago that was not reproducible that had something about trace_pipe (consuming read) not matching trace (iterator read). This may explain why that happened. Anyway, the correct answer to this bug is to always use the reader page an not reset the iterator to inside the writable ring buffer. Cc: stable@vger.kernel.org # 2.6.28+ Fixes: d769041f8653 "ring_buffer: implement new locking" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-06ring-buffer: Up rb_iter_peek() loop count to 3Steven Rostedt (Red Hat)
After writting a test to try to trigger the bug that caused the ring buffer iterator to become corrupted, I hit another bug: WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238() Modules linked in: ipt_MASQUERADE sunrpc [...] CPU: 1 PID: 5281 Comm: grep Tainted: G W 3.16.0-rc3-test+ #143 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000 ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010 ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003 Call Trace: [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75 [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97 [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238 [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238 [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96 [<ffffffff810c74a3>] ? s_start+0xd7/0x17b [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea [<ffffffff8114cf94>] ? seq_read+0x148/0x361 [<ffffffff81132d98>] ? vfs_read+0x93/0xf1 [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2 Debugging this bug, which triggers when the rb_iter_peek() loops too many times (more than 2 times), I discovered there's a case that can cause that function to legitimately loop 3 times! rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek() only deals with the reader page (it's for consuming reads). The rb_iter_peek() is for traversing the buffer without consuming it, and as such, it can loop for one more reason. That is, if we hit the end of the reader page or any page, it will go to the next page and try again. That is, we have this: 1. iter->head > iter->head_page->page->commit (rb_inc_iter() which moves the iter to the next page) try again 2. event = rb_iter_head_event() event->type_len == RINGBUF_TYPE_TIME_EXTEND rb_advance_iter() try again 3. read the event. But we never get to 3, because the count is greater than 2 and we cause the WARNING and return NULL. Up the counter to 3. Cc: stable@vger.kernel.org # 2.6.37+ Fixes: 69d1b839f7ee "ring-buffer: Bind time extend and data events together" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Steady transitioning of the BPF instructure to a generic spot so all kernel subsystems can make use of it, from Alexei Starovoitov. 2) SFC driver supports busy polling, from Alexandre Rames. 3) Take advantage of hash table in UDP multicast delivery, from David Held. 4) Lighten locking, in particular by getting rid of the LRU lists, in inet frag handling. From Florian Westphal. 5) Add support for various RFC6458 control messages in SCTP, from Geir Ola Vaagland. 6) Allow to filter bridge forwarding database dumps by device, from Jamal Hadi Salim. 7) virtio-net also now supports busy polling, from Jason Wang. 8) Some low level optimization tweaks in pktgen from Jesper Dangaard Brouer. 9) Add support for ipv6 address generation modes, so that userland can have some input into the process. From Jiri Pirko. 10) Consolidate common TCP connection request code in ipv4 and ipv6, from Octavian Purdila. 11) New ARP packet logger in netfilter, from Pablo Neira Ayuso. 12) Generic resizable RCU hash table, with intial users in netlink and nftables. From Thomas Graf. 13) Maintain a name assignment type so that userspace can see where a network device name came from (enumerated by kernel, assigned explicitly by userspace, etc.) From Tom Gundersen. 14) Automatic flow label generation on transmit in ipv6, from Tom Herbert. 15) New packet timestamping facilities from Willem de Bruijn, meant to assist in measuring latencies going into/out-of the packet scheduler, latency from TCP data transmission to ACK, etc" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits) cxgb4 : Disable recursive mailbox commands when enabling vi net: reduce USB network driver config options. tg3: Modify tg3_tso_bug() to handle multiple TX rings amd-xgbe: Perform phy connect/disconnect at dev open/stop amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask net: sun4i-emac: fix memory leak on bad packet sctp: fix possible seqlock seadlock in sctp_packet_transmit() Revert "net: phy: Set the driver when registering an MDIO bus device" cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine team: Simplify return path of team_newlink bridge: Update outdated comment on promiscuous mode net-timestamp: ACK timestamp for bytestreams net-timestamp: TCP timestamping net-timestamp: SCHED timestamp on entering packet scheduler net-timestamp: add key to disambiguate concurrent datagrams net-timestamp: move timestamp flags out of sk_flags net-timestamp: extend SCM_TIMESTAMPING ancillary data struct cxgb4i : Move stray CPL definitions to cxgb4 driver tcp: reduce spurious retransmits due to transient SACK reneging qlcnic: Initialize dcbnl_ops before register_netdev ...
2014-08-06Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "In this release: - PKCS#7 parser for the key management subsystem from David Howells - appoint Kees Cook as seccomp maintainer - bugfixes and general maintenance across the subsystem" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits) X.509: Need to export x509_request_asymmetric_key() netlabel: shorter names for the NetLabel catmap funcs/structs netlabel: fix the catmap walking functions netlabel: fix the horribly broken catmap functions netlabel: fix a problem when setting bits below the previously lowest bit PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1 tpm: simplify code by using %*phN specifier tpm: Provide a generic means to override the chip returned timeouts tpm: missing tpm_chip_put in tpm_get_random() tpm: Properly clean sysfs entries in error path tpm: Add missing tpm_do_selftest to ST33 I2C driver PKCS#7: Use x509_request_asymmetric_key() Revert "selinux: fix the default socket labeling in sock_graft()" X.509: x509_request_asymmetric_keys() doesn't need string length arguments PKCS#7: fix sparse non static symbol warning KEYS: revert encrypted key change ima: add support for measuring and appraising firmware firmware_class: perform new LSM checks security: introduce kernel_fw_from_file hook PKCS#7: Missing inclusion of linux/err.h ...
2014-08-06Rip out get_signal_to_deliver()Richard Weinberger
Now we can turn get_signal() to the main function. Signed-off-by: Richard Weinberger <richard@nod.at>
2014-08-06Clean up signal_delivered()Richard Weinberger
- Pass a ksignal struct to it - Remove unused regs parameter - Make it private as it's nowhere outside of kernel/signal.c is used Signed-off-by: Richard Weinberger <richard@nod.at>
2014-08-06tracehook_signal_handler: Remove sig, info, ka and regsRichard Weinberger
These parameters are nowhere used, so we can remove them. Signed-off-by: Richard Weinberger <richard@nod.at>
2014-08-05Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...
2014-08-05Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Nothing spectacular from the irq department this time: - overhaul of the crossbar chip driver - overhaul of the spear shirq chip driver - support for the atmel-aic chip - code move from arch to drivers - the usual tiny fixlets - two reverts worth to mention which undo the too simple attempt of supporting wakeup interrupts on shared interrupt lines" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) Revert "irq: Warn when shared interrupts do not match on NO_SUSPEND" Revert "PM / sleep / irq: Do not suspend wakeup interrupts" irq: Warn when shared interrupts do not match on NO_SUSPEND irqchip: atmel-aic: Define irq fixups for atmel SoCs irqchip: atmel-aic: Implement RTC irq fixup irqchip: atmel-aic: Add irq fixup infrastructure irqchip: atmel-aic: Add atmel AIC/AIC5 drivers irqchip: atmel-aic: Move binding doc to interrupt-controller directory genirq: generic chip: Export irq_map_generic_chip function PM / sleep / irq: Do not suspend wakeup interrupts irqchip: or1k-pic: Migrate from arch/openrisc/ irqchip: crossbar: Allow for quirky hardware with direct hardwiring of GIC documentation: dt: omap: crossbar: Add description for interrupt consumer irqchip: crossbar: Introduce centralized check for crossbar write irqchip: crossbar: Introduce ti, max-crossbar-sources to identify valid crossbar mapping irqchip: crossbar: Add kerneldoc for crossbar_domain_unmap callback irqchip: crossbar: Set cb pointer to null in case of error irqchip: crossbar: Change the goto naming irqchip: crossbar: Return proper error value irqchip: crossbar: Fix kerneldoc warning ...
2014-08-05Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node PM / Hibernate: Remove the old memory-bitmap implementation PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free() PM / Hibernate: Implement position keeping in radix tree PM / Hibernate: Add memory_rtree_find_bit function PM / Hibernate: Create a Radix-Tree to store memory bitmap PM / sleep: fix kernel-doc warnings in drivers/base/power/main.c
2014-08-05Merge branch 'pm-cpuidle'Rafael J. Wysocki
* pm-cpuidle: cpuidle: Remove time measurement in poll state cpuidle: Remove manual selection of the multiple driver support cpuidle: ladder governor - use macro instead of hardcoded value cpuidle: big_little: Fix build error cpuidle: menu governor - remove unused macro STDDEV_THRESH cpuidle: fix permission for driver name sysfs node cpuidle: move idle traces to cpuidle_enter_state()
2014-08-04Merge tag 'staging-3.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver updates from Greg KH: "Here's the big pull request for the staging driver tree for 3.17-rc1. Lots of things in here, over 2000 patches, but the best part is this: 1480 files changed, 39070 insertions(+), 254659 deletions(-) Thanks to the great work of Kristina Martšenko, 14 different staging drivers have been removed from the tree as they were obsolete and no one was willing to work on cleaning them up. Other than the driver removals, loads of cleanups are in here (comedi, lustre, etc.) as well as the usual IIO driver updates and additions. All of this has been in the linux-next tree for a while" * tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (2199 commits) staging: comedi: addi_apci_1564: remove diagnostic interrupt support code staging: comedi: addi_apci_1564: add subdevice to check diagnostic status staging: wlan-ng: coding style problem fix staging: wlan-ng: fixing coding style problems staging: comedi: ii_pci20kc: request and ioremap memory staging: lustre: bitwise vs logical typo staging: dgnc: Remove unneeded dgnc_trace.c and dgnc_trace.h staging: dgnc: rephrase comment staging: comedi: ni_tio: remove some dead code staging: rtl8723au: Fix static symbol sparse warning staging: rtl8723au: usb_dvobj_init(): Remove unused variable 'pdev_desc' staging: rtl8723au: Do not duplicate kernel provided USB macros staging: rtl8723au: Remove never set struct pwrctrl_priv.bHWPowerdown staging: rtl8723au: Remove two never set variables staging: rtl8723au: RSSI_test is never set staging:r8190: coding style: Fixed checkpatch reported Error staging:r8180: coding style: Fixed too long lines staging:r8180: coding style: Fixed commenting style staging: lustre: ptlrpc: lproc_ptlrpc.c - fix dereferenceing user space buffer staging: lustre: ldlm: ldlm_resource.c - fix dereferenceing user space buffer ...
2014-08-04Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Move the nohz kick code out of the scheduler tick to a dedicated IPI, from Frederic Weisbecker. This necessiated quite some background infrastructure rework, including: * Clean up some irq-work internals * Implement remote irq-work * Implement nohz kick on top of remote irq-work * Move full dynticks timer enqueue notification to new kick * Move multi-task notification to new kick * Remove unecessary barriers on multi-task notification - Remove proliferation of wait_on_bit() action functions and allow wait_on_bit_action() functions to support a timeout. (Neil Brown) - Another round of sched/numa improvements, cleanups and fixes. (Rik van Riel) - Implement fast idling of CPUs when the system is partially loaded, for better scalability. (Tim Chen) - Restructure and fix the CPU hotplug handling code that may leave cfs_rq and rt_rq's throttled when tasks are migrated away from a dead cpu. (Kirill Tkhai) - Robustify the sched topology setup code. (Peterz Zijlstra) - Improve sched_feat() handling wrt. static_keys (Jason Baron) - Misc fixes. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) sched/fair: Fix 'make xmldocs' warning caused by missing description sched: Use macro for magic number of -1 for setparam sched: Robustify topology setup sched: Fix sched_setparam() policy == -1 logic sched: Allow wait_on_bit_action() functions to support a timeout sched: Remove proliferation of wait_on_bit() action functions sched/numa: Revert "Use effective_load() to balance NUMA loads" sched: Fix static_key race with sched_feat() sched: Remove extra static_key*() function indirection sched/rt: Fix replenish_dl_entity() comments to match the current upstream code sched: Transform resched_task() into resched_curr() sched/deadline: Kill task_struct->pi_top_task sched: Rework check_for_tasks() sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime() sched/fair: Disable runtime_enabled on dying rq sched/numa: Change scan period code to match intent sched/numa: Rework best node setting in task_numa_migrate() sched/numa: Examine a task move when examining a task swap sched/numa: Simplify task_numa_compare() sched/numa: Use effective_load() to balance NUMA loads ...
2014-08-04Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Kernel side changes: - Consolidate the PMU interrupt-disabled code amongst architectures (Vince Weaver) - misc fixes Tooling changes (new features, user visible changes): - Add support for pagefault tracing in 'trace', please see multiple examples in the changeset messages (Stanislav Fomichev). - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add header for columns in 'top' and 'report' TUI browsers (Jiri Olsa) - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add IO mode into timechart command (Stanislav Fomichev) - Fallback to syscalls:* when raw_syscalls:* is not available in the perl and python perf scripts. (Daniel Bristot de Oliveira) - Add --repeat global option to 'perf bench' to be used in benchmarks such as the existing 'futex' one, that was modified to use it instead of a local option. (Davidlohr Bueso) - Fix fd -> pathname resolution in 'trace', be it using /proc or a vfs_getname probe point. (Arnaldo Carvalho de Melo) - Add suggestion of how to set perf_event_paranoid sysctl, to help non-root users trying tools like 'trace' to get a working environment. (Arnaldo Carvalho de Melo) - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup (Steven Rostedt, Jan Kiszka) - Support S/390 in 'perf kvm stat' (Alexander Yarygin) Tooling infrastructure changes: - Allow reserving a row for header purposes in the hists browser (Arnaldo Carvalho de Melo) - Various fixes and prep work related to supporting Intel PT (Adrian Hunter) - Introduce multiple debug variables control (Jiri Olsa) - Add callchain and additional sample information for python scripts (Joseph Schuchart) - More prep work to support Intel PT: (Adrian Hunter) - Polishing 'script' BTS output - 'inject' can specify --kallsym - VDSO is per machine, not a global var - Expose data addr lookup functions previously private to 'script' - Large mmap fixes in events processing - Include standard stringify macros in power pc code (Sukadev Bhattiprolu) Tooling cleanups: - Convert open coded equivalents to asprintf() (Andy Shevchenko) - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo) - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de Melo) - No need to reimplement err() in 'perf bench sched-messaging', drop barf(). (Davidlohr Bueso). - Remove ev_name argument from perf_evsel__hists_browse, can be obtained from the other parameters. (Jiri Olsa) Tooling fixes: - Fix memory leak in the 'sched-messaging' perf bench test. (Davidlohr Bueso) - The -o and -n 'perf bench mem' options are mutually exclusive, emit error when both are specified. (Davidlohr Bueso) - Fix scrollbar refresh row index in the ui browser, problem exposed now that headers will be added and will be allowed to be switched on/off. (Jiri Olsa) - Handle the num array type in python properly (Sebastian Andrzej Siewior) - Fix wrong condition for allocation failure (Jiri Olsa) - Adjust callchain based on DWARF debug info on powerpc (Sukadev Bhattiprolu) - Fix a risk for doing free on uninitialized pointer in traceevent lib (Rickard Strandqvist) - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa) - Enable close-on-exec flag on perf file descriptor (Yann Droneaud) - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo) - Event ordering fixes (Jiri Olsa)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits) Revert "perf tools: Fix jump label always changing during tracing" perf tools: Fix perf usage string leftover perf: Check permission only for parent tracepoint event perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds perf record: Always force PERF_RECORD_FINISHED_ROUND event perf inject: Add --kallsyms parameter perf tools: Expose 'addr' functions so they can be reused perf session: Fix accounting of ordered samples queue perf powerpc: Include util/util.h and remove stringify macros perf tools: Fix build on gcc 4.4.7 perf tools: Add thread parameter to vdso__dso_findnew() perf tools: Add dso__type() perf tools: Separate the VDSO map name from the VDSO dso name perf tools: Add vdso__new() perf machine: Fix the lifetime of the VDSO temporary file perf tools: Group VDSO global variables into a structure perf session: Add ability to skip 4GiB or more perf session: Add ability to 'skip' a non-piped event stream perf tools: Pass machine to vdso__dso_findnew() perf tools: Add dso__data_size() ...
2014-08-04Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - big rtmutex and futex cleanup and robustification from Thomas Gleixner - mutex optimizations and refinements from Jason Low - arch_mutex_cpu_relax() removal and related cleanups - smaller lockdep tweaks" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) arch, locking: Ciao arch_mutex_cpu_relax() locking/lockdep: Only ask for /proc/lock_stat output when available locking/mutexes: Optimize mutex trylock slowpath locking/mutexes: Try to acquire mutex only if it is unlocked locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro locking/mutexes: Correct documentation on mutex optimistic spinning rtmutex: Make the rtmutex tester depend on BROKEN futex: Simplify futex_lock_pi_atomic() and make it more robust futex: Split out the first waiter attachment from lookup_pi_state() futex: Split out the waiter check from lookup_pi_state() futex: Use futex_top_waiter() in lookup_pi_state() futex: Make unlock_pi more robust rtmutex: Avoid pointless requeueing in the deadlock detection chain walk rtmutex: Cleanup deadlock detector debug logic rtmutex: Confine deadlock logic to futex rtmutex: Simplify remove_waiter() rtmutex: Document pi chain walk rtmutex: Clarify the boost/deboost part rtmutex: No need to keep task ref for lock owner check rtmutex: Simplify and document try_to_take_rtmutex() ...
2014-08-04Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molar: "The main changes: - torture-test updates - callback-offloading changes - maintainership changes - update RCU documentation - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() rcu: Fix a sparse warning in rcu_initiate_boost() rcu: Fix __rcu_reclaim() to use true/false for bool rcu: Remove CONFIG_PROVE_RCU_DELAY rcu: Use __this_cpu_read() instead of per_cpu_ptr() rcu: Don't use NMIs to dump other CPUs' stacks rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs rcu: Simplify priority boosting by putting rt_mutex in rcu_node rcu: Check both root and current rcu_node when setting up future grace period rcu: Allow post-unlock reference for rt_mutex rcu: Loosen __call_rcu()'s rcu_head alignment constraint rcu: Eliminate read-modify-write ACCESS_ONCE() calls rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu rcu: Make rcu node arrays static const char * const signal: Explain local_irq_save() call rcu: Handle obsolete references to TINY_PREEMPT_RCU rcu: Document deadlock-avoidance information for rcu_read_unlock() scripts: Teach get_maintainer.pl about the new "R:" tag rcu: Update rcu torture maintainership filename patterns ...
2014-08-04Merge tag 'trace-3.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing filter cleanups from Steven Rostedt: "Oleg Nesterov did several clean ups with the tracing filter code. As he found some small bugs that went into 3.16, and these changes were based on that, I had to apply his changes to a separate branch than my main development branch. This was based on work that was already pulled into 3.16, and is a separate pull request to keep from having local merges in my pull request" * tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Kill "filter_string" arg of replace_preds() tracing: Change apply_subsystem_event_filter() paths to check file->system == dir tracing: Kill ftrace_event_call->files tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic tracing: Kill call_filter_disable() tracing: Kill destroy_call_preds() tracing: Kill destroy_preds() and destroy_file_preds()
2014-08-04Merge tag 'trace-3.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This pull request has a lot of work done. The main thing is the changes to the ftrace function callback infrastructure. It's introducing a way to allow different functions to call directly different trampolines instead of all calling the same "mcount" one. The only user of this for now is the function graph tracer, which always had a different trampoline, but the function tracer trampoline was called and did basically nothing, and then the function graph tracer trampoline was called. The difference now, is that the function graph tracer trampoline can be called directly if a function is only being traced by the function graph trampoline. If function tracing is also happening on the same function, the old way is still done. The accounting for this takes up more memory when function graph tracing is activated, as it needs to keep track of which functions it uses. I have a new way that wont take as much memory, but it's not ready yet for this merge window, and will have to wait for the next one. Another big change was the removal of the ftrace_start/stop() calls that were used by the suspend/resume code that stopped function tracing when entering into suspend and resume paths. The stop of ftrace was done because there was some function that would crash the system if one called smp_processor_id()! The stop/start was a big hammer to solve the issue at the time, which was when ftrace was first introduced into Linux. Now ftrace has better infrastructure to debug such issues, and I found the problem function and labeled it with "notrace" and function tracing can now safely be activated all the way down into the guts of suspend and resume Other changes include clean ups of uprobe code, clean up of the trace_seq() code, and other various small fixes and clean ups to ftrace and tracing" * tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits) ftrace: Add warning if tramp hash does not match nr_trampolines ftrace: Fix trampoline hash update check on rec->flags ring-buffer: Use rb_page_size() instead of open coded head_page size ftrace: Rename ftrace_ops field from trampolines to nr_trampolines tracing: Convert local function_graph functions to static ftrace: Do not copy old hash when resetting tracing: let user specify tracing_thresh after selecting function_graph ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on() tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST s390/ftrace: remove check of obsolete variable function_trace_stop arm64, ftrace: Remove check of obsolete variable function_trace_stop Blackfin: ftrace: Remove check of obsolete variable function_trace_stop metag: ftrace: Remove check of obsolete variable function_trace_stop microblaze: ftrace: Remove check of obsolete variable function_trace_stop MIPS: ftrace: Remove check of obsolete variable function_trace_stop parisc: ftrace: Remove check of obsolete variable function_trace_stop sh: ftrace: Remove check of obsolete variable function_trace_stop sparc64,ftrace: Remove check of obsolete variable function_trace_stop tile: ftrace: Remove check of obsolete variable function_trace_stop ftrace: x86: Remove check of obsolete variable function_trace_stop ...
2014-08-04Merge branch 'for-3.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Mostly changes to get the v2 interface ready. The core features are mostly ready now and I think it's reasonable to expect to drop the devel mask in one or two devel cycles at least for a subset of controllers. - cgroup added a controller dependency mechanism so that block cgroup can depend on memory cgroup. This will be used to finally support IO provisioning on the writeback traffic, which is currently being implemented. - The v2 interface now uses a separate table so that the interface files for the new interface are explicitly declared in one place. Each controller will explicitly review and add the files for the new interface. - cpuset is getting ready for the hierarchical behavior which is in the similar style with other controllers so that an ancestor's configuration change doesn't change the descendants' configurations irreversibly and processes aren't silently migrated when a CPU or node goes down. All the changes are to the new interface and no behavior changed for the multiple hierarchies" * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits) cpuset: fix the WARN_ON() in update_nodemasks_hier() cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core cgroup: distinguish the default and legacy hierarchies when handling cftypes cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[] cpuset: export effective masks to userspace cpuset: allow writing offlined masks to cpuset.cpus/mems cpuset: enable onlined cpu/node in effective masks cpuset: refactor cpuset_hotplug_update_tasks() cpuset: make cs->{cpus, mems}_allowed as user-configured masks cpuset: apply cs->effective_{cpus,mems} cpuset: initialize top_cpuset's configured masks at mount cpuset: use effective cpumask to build sched domains cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty cpuset: update cs->effective_{cpus, mems} when config changes cpuset: update cpuset->effective_{cpus,mems} at hotplug cpuset: add cs->effective_cpus and cs->effective_mems cgroup: clean up sane_behavior handling ...
2014-08-04Merge branch 'for-3.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: - Major reorganization of percpu header files which I think makes things a lot more readable and logical than before. - percpu-refcount is updated so that it requires explicit destruction and can be reinitialized if necessary. This was pulled into the block tree to replace the custom percpu refcnting implemented in blk-mq. - In the process, percpu and percpu-refcount got cleaned up a bit * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits) percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero() percpu-refcount: require percpu_ref to be exited explicitly percpu-refcount: use unsigned long for pcpu_count pointer percpu-refcount: add helpers for ->percpu_count accesses percpu-refcount: one bit is enough for REF_STATUS percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc() workqueue: stronger test in process_one_work() workqueue: clear POOL_DISASSOCIATED in rebind_workers() percpu: Use ALIGN macro instead of hand coding alignment calculation percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations percpu: preffity percpu header files percpu: use raw_cpu_*() to define __this_cpu_*() percpu: reorder macros in percpu header files percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops percpu: reorganize include/linux/percpu-defs.h percpu: move accessors from include/linux/percpu.h to percpu-defs.h percpu: include/asm-generic/percpu.h should contain only arch-overridable parts percpu: introduce arch_raw_cpu_ptr() ...
2014-08-04Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: "Lai has been doing a lot of cleanups of workqueue and kthread_work. No significant behavior change. Just a lot of cleanups all over the place. Some are a bit invasive but overall nothing too dangerous" * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: kthread_work: remove the unused wait_queue_head kthread_work: wake up worker only when the worker is idle workqueue: use nr_node_ids instead of wq_numa_tbl_len workqueue: remove the misnamed out_unlock label in get_unbound_pool() workqueue: remove the stale comment in pwq_unbound_release_workfn() workqueue: move rescuer pool detachment to the end workqueue: unfold start_worker() into create_worker() workqueue: remove @wakeup from worker_set_flags() workqueue: remove an unneeded UNBOUND test before waking up the next worker workqueue: wake regular worker if need_more_worker() when rescuer leave the pool workqueue: alloc struct worker on its local node workqueue: reuse the already calculated pwq in try_to_grab_pending() workqueue: stronger test in process_one_work() workqueue: clear POOL_DISASSOCIATED in rebind_workers() workqueue: sanity check pool->cpu in wq_worker_sleeping() workqueue: clear leftover flags when detached workqueue: remove useless WARN_ON_ONCE() workqueue: use schedule_timeout_interruptible() instead of open code workqueue: remove the empty check in too_many_workers() workqueue: use "pool->cpu < 0" to stand for an unbound pool
2014-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: - CTR(AES) optimisation on x86_64 using "by8" AVX. - arm64 support to ccp - Intel QAT crypto driver - Qualcomm crypto engine driver - x86-64 assembly optimisation for 3DES - CTR(3DES) speed test - move FIPS panic from module.c so that it only triggers on crypto modules - SP800-90A Deterministic Random Bit Generator (drbg). - more test vectors for ghash. - tweak self tests to catch partial block bugs. - misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (94 commits) crypto: drbg - fix failure of generating multiple of 2**16 bytes crypto: ccp - Do not sign extend input data to CCP crypto: testmgr - add missing spaces to drbg error strings crypto: atmel-tdes - Switch to managed version of kzalloc crypto: atmel-sha - Switch to managed version of kzalloc crypto: testmgr - use chunks smaller than algo block size in chunk tests crypto: qat - Fixed SKU1 dev issue crypto: qat - Use hweight for bit counting crypto: qat - Updated print outputs crypto: qat - change ae_num to ae_id crypto: qat - change slice->regions to slice->region crypto: qat - use min_t macro crypto: qat - remove unnecessary parentheses crypto: qat - remove unneeded header crypto: qat - checkpatch blank lines crypto: qat - remove unnecessary return codes crypto: Resolve shadow warnings crypto: ccp - Remove "select OF" from Kconfig crypto: caam - fix DECO RSR polling crypto: qce - Let 'DEV_QCE' depend on both HAS_DMA and HAS_IOMEM ...
2014-08-03Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes in the timer area: - a long-standing lock inversion due to a printk - suspend-related hrtimer corruption in sched_clock" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks sched_clock: Avoid corrupting hrtimer tree during suspend
2014-08-02net: filter: split 'struct sk_filter' into socket and bpf partsAlexei Starovoitov
clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02net: filter: rename sk_convert_filter() -> bpf_convert_filter()Alexei Starovoitov
to indicate that this function is converting classic BPF into eBPF and not related to sockets Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02net: filter: rename sk_chk_filter() -> bpf_check_classic()Alexei Starovoitov
trivial rename to indicate that this functions performs classic BPF checking Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-01timer: Fix lock inversion between hrtimer_bases.lock and scheduler locksJan Kara
clockevents_increase_min_delta() calls printk() from under hrtimer_bases.lock. That causes lock inversion on scheduler locks because printk() can call into the scheduler. Lockdep puts it as: ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc8-06195-g939f04b #2 Not tainted ------------------------------------------------------- trinity-main/74 is trying to acquire lock: (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c but task is already holding lock: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (hrtimer_bases.lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197 [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85 [<81080792>] task_clock_event_start+0x3a/0x3f [<810807a4>] task_clock_event_add+0xd/0x14 [<8108259a>] event_sched_in+0xb6/0x17a [<810826a2>] group_sched_in+0x44/0x122 [<81082885>] ctx_sched_in.isra.67+0x105/0x11f [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b [<81082bf6>] __perf_install_in_context+0x8b/0xa3 [<8107eb8e>] remote_function+0x12/0x2a [<8105f5af>] smp_call_function_single+0x2d/0x53 [<8107e17d>] task_function_call+0x30/0x36 [<8107fb82>] perf_install_in_context+0x87/0xbb [<810852c9>] SYSC_perf_event_open+0x5c6/0x701 [<810856f9>] SyS_perf_event_open+0x17/0x19 [<8142f8ee>] syscall_call+0x7/0xb -> #4 (&ctx->lock){......}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 -> #3 (&rq->lock){-.-.-.}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81040873>] __task_rq_lock+0x33/0x3a [<8104184c>] wake_up_new_task+0x25/0xc2 [<8102474b>] do_fork+0x15c/0x2a0 [<810248a9>] kernel_thread+0x1a/0x1f [<814232a2>] rest_init+0x1a/0x10e [<817af949>] start_kernel+0x303/0x308 [<817af2ab>] i386_start_kernel+0x79/0x7d -> #2 (&p->pi_lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<810413dd>] try_to_wake_up+0x1d/0xd6 [<810414cd>] default_wake_function+0xb/0xd [<810461f3>] __wake_up_common+0x39/0x59 [<81046346>] __wake_up+0x29/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #1 (&tty->write_wait){-.....}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<81046332>] __wake_up+0x15/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #0 (&port_lock_key){-.....}: [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105c548>] clockevents_program_event+0xe7/0xf3 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 other info that might help us debug this: Chain exists of: &port_lock_key --> &ctx->lock --> hrtimer_bases.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hrtimer_bases.lock); lock(&ctx->lock); lock(hrtimer_bases.lock); lock(&port_lock_key); *** DEADLOCK *** 4 locks held by trinity-main/74: #0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb #1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f #2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 #3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4 stack backtrace: CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003 Call Trace: [<81426f69>] dump_stack+0x16/0x18 [<81425a99>] print_circular_bug+0x18f/0x19c [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104af87>] ? lock_release+0x191/0x223 [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8104416d>] ? __dequeue_entity+0x23/0x27 [<81044505>] ? pick_next_task_fair+0xb1/0x120 [<8142cacc>] __schedule+0x4c6/0x4cb [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108 [<810475b0>] ? trace_hardirqs_off+0xb/0xd [<81056346>] ? rcu_irq_exit+0x64/0x77 Fix the problem by using printk_deferred() which does not call into the scheduler. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-31Revert "irq: Warn when shared interrupts do not match on NO_SUSPEND"Thomas Gleixner
This reverts commit 4fae4e7624653ef498d0e2a38f00620b9701ab04. Undo because it breaks working systems. Requested-by: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-31Revert "PM / sleep / irq: Do not suspend wakeup interrupts"Thomas Gleixner
This reverts commit d709f7bcbb3ab01704fa7b37a2e4b981cf3783c1. Undo, because it might break exisiting functionality. Requested-by: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-30kexec: fix build error when hugetlbfs is disabledDavid Rientjes
free_huge_page() is undefined without CONFIG_HUGETLBFS and there's no need to filter PageHuge() page is such a configuration either, so avoid exporting the symbol to fix a build error: In file included from kernel/kexec.c:14:0: kernel/kexec.c: In function 'crash_save_vmcoreinfo_init': kernel/kexec.c:1623:20: error: 'free_huge_page' undeclared (first use in this function) VMCOREINFO_SYMBOL(free_huge_page); ^ Introduced by commit 8f1d26d0e59b ("kexec: export free_huge_page to VMCOREINFO") Reported-by: kbuild test robot <fengguang.wu@intel.com> Acked-by: Olof Johansson <olof@lixom.net> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30Josh has movedJosh Triplett
My IBM email addresses haven't worked for years; also map some old-but-functional forwarding addresses to my canonical address. Update my GPG key fingerprint; I moved to 4096R a long time ago. Update description. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30kexec: export free_huge_page to VMCOREINFOAtsushi Kumagai
PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe1 ("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need another symbol to filter *hugetlbfs* pages. If a user hope to filter user pages, makedumpfile tries to exclude them by checking the condition whether the page is anonymous, but hugetlbfs pages aren't anonymous while they also be user pages. We know it's possible to detect them in the same way as PageHuge(), so we need the start address of free_huge_page(): int PageHuge(struct page *page) { if (!PageCompound(page)) return 0; page = compound_head(page); return get_compound_page_dtor(page) == free_huge_page; } For that reason, this patch changes free_huge_page() into public to export it to VMCOREINFO. Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Acked-by: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30cpuset: fix the WARN_ON() in update_nodemasks_hier()Li Zefan
The WARN_ON() is used to check if we break the legal hierarchy, on which the effective mems should be equal to configured mems. Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: Li Zefan <lizefan@huawei.com>
2014-07-29namespaces: Use task_lock and not rcu to protect nsproxyEric W. Biederman
The synchronous syncrhonize_rcu in switch_task_namespaces makes setns a sufficiently expensive system call that people have complained. Upon inspect nsproxy no longer needs rcu protection for remote reads. remote reads are rare. So optimize for same process reads and write by switching using rask_lock instead. This yields a simpler to understand lock, and a faster setns system call. In particular this fixes a performance regression observed by Rafael David Tinoco <rafael.tinoco@canonical.com>. This is effectively a revert of Pavel Emelyanov's commit cf7b708c8d1d7a27736771bcf4c457b332b0f818 Make access to task's nsproxy lighter from 2007. The race this originialy fixed no longer exists as do_notify_parent uses task_active_pid_ns(parent) instead of parent->nsproxy. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-29PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_nodeJoerg Roedel
When a memory bitmap is fully populated on a large memory machine (several TB of RAM) it can take more than a minute to walk through all bits. This causes the soft lockup detector on these machine to report warnings. Avoid this by touching the soft lockup watchdog in the memory bitmap walking code. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Remove the old memory-bitmap implementationJoerg Roedel
The radix tree implementatio is proved to work the same as the old implementation now. So the old implementation can be removed to finish the switch to the radix tree for the memory bitmaps. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()Joerg Roedel
The existing implementation of swsusp_free iterates over all pfns in the system and checks every bit in the two memory bitmaps. This doesn't scale very well with large numbers of pfns, especially when the bitmaps are not populated very densly. Change the algorithm to iterate over the set bits in the bitmaps instead to make it scale better in large memory configurations. Also add a memory_bm_clear_current() helper function that clears the bit for the last position returned from the memory bitmap. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Implement position keeping in radix treeJoerg Roedel
Add code to remember the last position that was requested in the radix tree. Use it as a cache for faster linear walking of the bitmap in the memory_bm_rtree_next_pfn() function which is also added with this patch. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Add memory_rtree_find_bit functionJoerg Roedel
Add a function to find a bit in the radix tree for a given pfn. Also add code to the memory bitmap wrapper functions to use the radix tree together with the existing memory bitmap implementation. On read accesses compare the results of both bitmaps to make sure the radix tree behaves the same way. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Create a Radix-Tree to store memory bitmapJoerg Roedel
This patch adds the code to allocate and build the radix tree to store the memory bitmap. The old data structure is left in place until the radix tree implementation is finished. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-28kthread_work: wake up worker only when the worker is idleLai Jiangshan
If the worker is already executing a work item when another is queued, we can safely skip wakeup without worrying about stalling queue thus avoiding waking up the busy worker spuriously. Spurious wakeups should be fine but still isn't nice and avoiding it is trivial here. tj: Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-28sched/fair: Fix 'make xmldocs' warning caused by missing descriptionMasanari Iida
This patch fix following warning caused by missing description "overload" in kernel/sched/fair.c Warning(.//kernel/sched/fair.c:5906): No description found for parameter 'overload' Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1406518686-7274-1-git-send-email-standby24x7@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28sched: Use macro for magic number of -1 for setparamSteven Rostedt
Instead of passing around a magic number -1 for the sched_setparam() policy, use a more descriptive macro name like SETPARAM_POLICY. [ based on top of Daniel's sched_setparam() fix ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Daniel Bristot de Oliveira<bristot@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140723112826.6ed6cbce@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28sched: Robustify topology setupPeter Zijlstra
We hard assume that higher topology levels are supersets of lower levels. Detect, warn and try to fixup when we encounter this violated. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Bruno Wolff III <bruno@wolff.to> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140722094740.GJ12054@laptop.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28Merge branch 'sched/urgent' into sched/core, to merge fixes before applying ↵Ingo Molnar
new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28perf: Check permission only for parent tracepoint eventJiri Olsa
There's no need to check cloned event's permission once the parent was already checked. Also the code is checking 'current' process permissions, which is not owner process for cloned events, thus could end up with wrong permission check result. Reported-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Tested-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1405079782-8139-1-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28Merge tag 'v3.16-rc7' into perf/core, to merge in the latest fixes before ↵Ingo Molnar
applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28sched: Fix sched_setparam() policy == -1 logicDaniel Bristot de Oliveira
The scheduler uses policy == -1 to preserve the current policy state to implement sched_setparam(). But, as (int) -1 is equals to 0xffffffff, it's matching the if (policy & SCHED_RESET_ON_FORK) on _sched_setscheduler(). This match changes the policy value to an invalid value, breaking the sched_setparam() syscall. This patch checks policy == -1 before check the SCHED_RESET_ON_FORK flag. The following program shows the bug: int main(void) { struct sched_param param = { .sched_priority = 5, }; sched_setscheduler(0, SCHED_FIFO, &param); param.sched_priority = 1; sched_setparam(0, &param); param.sched_priority = 0; sched_getparam(0, &param); if (param.sched_priority != 1) printf("failed priority setting (found %d instead of 1)\n", param.sched_priority); else printf("priority setting fine\n"); } Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> # 3.14+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Fixes: 7479f3c9cf67 "sched: Move SCHED_RESET_ON_FORK into attr::sched_flags" Link: http://lkml.kernel.org/r/9ebe0566a08dbbb3999759d3f20d6004bb2dbcfa.1406079891.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>