Age | Commit message (Collapse) | Author |
|
initialization
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Since we introduced N_MEMORY, we update the initialization of node_states.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pass vma instead of mm and add address parameter.
In most cases we already have vma on the stack. We provides
split_huge_page_pmd_mm() for few cases when we have mm, but not vma.
This change is preparation to huge zero pmd splitting implementation.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull big execve/kernel_thread/fork unification series from Al Viro:
"All architectures are converted to new model. Quite a bit of that
stuff is actually shared with architecture trees; in such cases it's
literally shared branch pulled by both, not a cherry-pick.
A lot of ugliness and black magic is gone (-3KLoC total in this one):
- kernel_thread()/kernel_execve()/sys_execve() redesign.
We don't do syscalls from kernel anymore for either kernel_thread()
or kernel_execve():
kernel_thread() is essentially clone(2) with callback run before we
return to userland, the callbacks either never return or do
successful do_execve() before returning.
kernel_execve() is a wrapper for do_execve() - it doesn't need to
do transition to user mode anymore.
As a result kernel_thread() and kernel_execve() are
arch-independent now - they live in kernel/fork.c and fs/exec.c
resp. sys_execve() is also in fs/exec.c and it's completely
architecture-independent.
- daemonize() is gone, along with its parts in fs/*.c
- struct pt_regs * is no longer passed to do_fork/copy_process/
copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.
- sys_fork()/sys_vfork()/sys_clone() unified; some architectures
still need wrappers (ones with callee-saved registers not saved in
pt_regs on syscall entry), but the main part of those suckers is in
kernel/fork.c now."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
do_coredump(): get rid of pt_regs argument
print_fatal_signal(): get rid of pt_regs argument
ptrace_signal(): get rid of unused arguments
get rid of ptrace_signal_deliver() arguments
new helper: signal_pt_regs()
unify default ptrace_signal_deliver
flagday: kill pt_regs argument of do_fork()
death to idle_regs()
don't pass regs to copy_process()
flagday: don't pass regs to copy_thread()
bfin: switch to generic vfork, get rid of pointless wrappers
xtensa: switch to generic clone()
openrisc: switch to use of generic fork and clone
unicore32: switch to generic clone(2)
score: switch to generic fork/vfork/clone
c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
mn10300: switch to generic fork/vfork/clone
h8300: switch to generic fork/vfork/clone
tile: switch to generic clone()
...
Conflicts:
arch/microblaze/include/asm/Kbuild
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer update from Ingo Molnar:
"This tree includes HPET fixes and also implements a calibration-free,
TSC match driven APIC timer interrupt mode: 'TSC deadline mode'
supported in SandyBridge and later CPUs."
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: hpet: Fix inverted return value check in arch_setup_hpet_msi()
x86: hpet: Fix masking of MSI interrupts
x86: apic: Use tsc deadline for oneshot when available
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull "Nuke 386-DX/SX support" from Ingo Molnar:
"This tree removes ancient-386-CPUs support and thus zaps quite a bit
of complexity:
24 files changed, 56 insertions(+), 425 deletions(-)
... which complexity has plagued us with extra work whenever we wanted
to change SMP primitives, for years.
Unfortunately there's a nostalgic cost: your old original 386 DX33
system from early 1991 won't be able to boot modern Linux kernels
anymore. Sniff."
I'm not sentimental. Good riddance.
* 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, 386 removal: Document Nx586 as a 386 and thus unsupported
x86, cleanups: Simplify sync_core() in the case of no CPUID
x86, 386 removal: Remove CONFIG_X86_POPAD_OK
x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK
x86, 386 removal: Remove CONFIG_INVLPG
x86, 386 removal: Remove CONFIG_BSWAP
x86, 386 removal: Remove CONFIG_XADD
x86, 386 removal: Remove CONFIG_CMPXCHG
x86, 386 removal: Remove CONFIG_M386 from Kconfig
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 topology discovery improvements from Ingo Molnar:
"These changes improve topology discovery on AMD CPUs.
Right now this feeds information displayed in
/sys/devices/system/cpu/cpuX/cache/indexY/* - but in the future we
could use this to set up a better scheduling topology."
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, cacheinfo: Base cache sharing info on CPUID 0x8000001d on AMD
x86, cacheinfo: Make use of CPUID 0x8000001d for cache information on AMD
x86, cacheinfo: Determine number of cache leafs using CPUID 0x8000001d on AMD
x86: Add cpu_has_topoext
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Small cleanups."
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix the error of using "const" in gen-insn-attr-x86.awk
x86, apic: Cleanup cfg->domain setup for legacy interrupts
x86: Remove dead hlt_use_halt code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 BSP hotplug changes from Ingo Molnar:
"This tree enables CPU#0 (the boot processor) to be onlined/offlined on
x86, just like any other CPU. Enabled on Intel CPUs for now.
Allowing this required the identification and fixing of latent CPU#0
assumptions (such as CPU#0 initializations, etc.) in the x86
architecture code, plus the identification of barriers to
BSP-offlining, such as active PIC interrupts which can only be
serviced on the BSP.
It's behind a default-off option, and there's a debug option that
allows the automatic testing of this feature.
The motivation of this feature is to allow and prepare for true
CPU-hotplug hardware support: recent changes to MCE support enable us
to detect a deteriorating but not yet hard-failing L1/L2 cache on a
CPU that could be soft-unplugged - or a failing L3 cache on a
multi-socket system.
Note that true hardware hot-plug is not yet fully enabled by this,
because that requires a special platform wakeup sequence to be sent to
the freshly powered up CPU#0. Future patches for this are planned,
once such a platform exists. Chicken and egg"
* 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, topology: Debug CPU0 hotplug
x86/i387.c: Initialize thread xstate only on CPU0 only once
x86, hotplug: Handle retrigger irq by the first available CPU
x86, hotplug: The first online processor saves the MTRR state
x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
x86-32, hotplug: Add start_cpu0() entry point to head_32.S
x86-64, hotplug: Add start_cpu0() entry point to head_64.S
kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
x86, hotplug, suspend: Online CPU0 for suspend or hibernate
x86, hotplug: Support functions for CPU0 online/offline
x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
x86, Kconfig: Add config switch for CPU0 hotplug
doc: Add x86 CPU0 online/offline feature
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot changes from Ingo Molnar:
"Two small changes: a cleanup and allow CONFIG_X86_MPPARSE to be turned
off on SFI as well."
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arch/x86/Kconfig: Allow turning off CONFIG_X86_MPPARSE when either ACPI or SFI is present
x86/boot/doc: Fix grammar and typo in boot.txt
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
"Two fixlets and a cleanup."
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_32: Return actual stack when requesting sp from regs
x86: Don't clobber top of pt_regs in nested NMI
x86/asm: Clean up copy_page_*() comments and code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Lots of activity:
211 files changed, 8328 insertions(+), 4116 deletions(-)
most of it on the tooling side.
Main changes:
* ftrace enhancements and fixes from Steve Rostedt.
* uprobes fixes, cleanups and preparation for the ARM port from Oleg
Nesterov.
* UAPI fixes, from David Howels - prepares the arch/x86 UAPI
transition
* Separate perf tests into multiple objects, one per test, from Jiri
Olsa.
* Make hardware event translations available in sysfs, from Jiri
Olsa.
* Fixes to /proc/pid/maps parsing, preparatory to supporting data
maps, from Namhyung Kim
* Implement ui_progress for GTK, from Namhyung Kim
* Add framework for automated perf_event_attr tests, where tools with
different command line options will be run from a 'perf test', via
python glue, and the perf syscall will be intercepted to verify
that the perf_event_attr fields set by the tool are those expected,
from Jiri Olsa
* Add a 'link' method for hists, so that we can have the leader with
buckets for all the entries in all the hists. This new method is
now used in the default 'diff' output, making the sum of the
'baseline' column be 100%, eliminating blind spots.
* libtraceevent fixes for compiler warnings trying to make perf it
build on some distros, like fedora 14, 32-bit, some of the warnings
really pointed to real bugs.
* Add a browser for 'perf script' and make it available from the
report and annotate browsers. It does filtering to find the
scripts that handle events found in the perf.data file used. From
Feng Tang
* perf inject changes to allow showing where a task sleeps, from
Andrew Vagin.
* Makefile improvements from Namhyung Kim.
* Add --pre and --post command hooks in 'stat', from Peter Zijlstra.
* Don't stop synthesizing threads when one vanishes, this is for the
existing threads when we start a tool like trace.
* Use sched:sched_stat_runtime to provide a thread summary, this
produces the same output as the 'trace summary' subcommand of
tglx's original "trace" tool.
* Support interrupted syscalls in 'trace'
* Add an event duration column and filter in 'trace'.
* There are references to the man pages in some tools, so try to
build Documentation when installing, warning the user if that is
not possible, from Borislav Petkov.
* Give user better message if precise is not supported, from David
Ahern.
* Try to find cross-built objdump path by using the session
environment information in the perf.data file header, from Irina
Tirdea, original patch and idea by Namhyung Kim.
* Diplays more output on features check for make V=1, so that one can
figure out what is happening by looking at gcc output, etc. From
Jiri Olsa.
* Add on_exit implementation for systems without one, e.g. Android,
from Bernhard Rosenkraenzer.
* Only process events for vcpus of interest, helps handling large
number of events, from David Ahern.
* Cross compilation fixes for Android, from Irina Tirdea.
* Add documentation on compiling for Android, from Irina Tirdea.
* perf diff improvements from Jiri Olsa.
* Target (task/user/cpu/syswide) handling improvements, from Namhyung
Kim.
* Add support in 'trace' for tracing workload given by command line,
from Namhyung Kim.
* ... and much more."
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits)
uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race
perf evsel: Introduce is_group_member method
perf powerpc: Use uapi/unistd.h to fix build error
tools: Pass the target in descend
tools: Honour the O= flag when tool build called from a higher Makefile
tools: Define a Makefile function to do subdir processing
perf ui: Always compile browser setup code
perf ui: Add ui_progress__finish()
perf ui gtk: Implement ui_progress functions
perf ui: Introduce generic ui_progress helper
perf ui tui: Move progress.c under ui/tui directory
perf tools: Add basic event modifier sanity check
perf tools: Omit group members from perf_evlist__disable/enable
perf tools: Ensure single disable call per event in record comand
perf tools: Fix 'disabled' attribute config for record command
perf tools: Fix attributes for '{}' defined event groups
perf tools: Use sscanf for parsing /proc/pid/maps
perf tools: Add gtk.<command> config option for launching GTK browser
perf tools: Fix compile error on NO_NEWT=1 build
perf hists: Initialize all of he->stat with zeroes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU update from Ingo Molnar:
"The major features of this tree are:
1. A first version of no-callbacks CPUs. This version prohibits
offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y.
Relaxing this constraint is in progress, but not yet ready
for prime time. These commits were posted to LKML at
https://lkml.org/lkml/2012/10/30/724.
2. Changes to SRCU that allows statically initialized srcu_struct
structures. These commits were posted to LKML at
https://lkml.org/lkml/2012/10/30/296.
3. Restructuring of RCU's debugfs output. These commits were posted
to LKML at https://lkml.org/lkml/2012/10/30/341.
4. Additional CPU-hotplug/RCU improvements, posted to LKML at
https://lkml.org/lkml/2012/10/30/327.
Note that the commit eliminating __stop_machine() was judged to
be too-high of risk, so is deferred to 3.9.
5. Changes to RCU's idle interface, most notably a new module
parameter that redirects normal grace-period operations to
their expedited equivalents. These were posted to LKML at
https://lkml.org/lkml/2012/10/30/739.
6. Additional diagnostics for RCU's CPU stall warning facility,
posted to LKML at https://lkml.org/lkml/2012/10/30/315.
The most notable change reduces the
default RCU CPU stall-warning time from 60 seconds to 21 seconds,
so that it once again happens sooner than the softlockup timeout.
7. Documentation updates, which were posted to LKML at
https://lkml.org/lkml/2012/10/30/280.
A couple of late-breaking changes were posted at
https://lkml.org/lkml/2012/11/16/634 and
https://lkml.org/lkml/2012/11/16/547.
8. Miscellaneous fixes, which were posted to LKML at
https://lkml.org/lkml/2012/10/30/309.
9. Finally, a fix for an lockdep-RCU splat was posted to LKML
at https://lkml.org/lkml/2012/11/7/486."
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
context_tracking: New context tracking susbsystem
sched: Mark RCU reader in sched_show_task()
rcu: Separate accounting of callbacks from callback-free CPUs
rcu: Add callback-free CPUs
rcu: Add documentation for the new rcuexp debugfs trace file
rcu: Update documentation for TREE_RCU debugfs tracing
rcu: Reduce default RCU CPU stall warning timeout
rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check
rcu: Clarify memory-ordering properties of grace-period primitives
rcu: Add new rcutorture module parameters to start/end test messages
rcu: Remove list_for_each_continue_rcu()
rcu: Fix batch-limit size problem
rcu: Add tracing for synchronize_sched_expedited()
rcu: Remove old debugfs interfaces and also RCU flavor name
rcu: split 'rcuhier' to each flavor
rcu: split 'rcugp' to each flavor
rcu: split 'rcuboost' to each flavor
rcu: split 'rcubarrier' to each flavor
rcu: Fix tracing formatting
rcu: Remove the interface "rcudata.csv"
...
|
|
Merge misc updates from Andrew Morton:
"About half of most of MM. Going very early this time due to
uncertainty over the coreautounifiednumasched things. I'll send the
other half of most of MM tomorrow. The rest of MM awaits a slab merge
from Pekka."
* emailed patches from Andrew Morton: (71 commits)
memory_hotplug: ensure every online node has NORMAL memory
memory_hotplug: handle empty zone when online_movable/online_kernel
mm, memory-hotplug: dynamic configure movable memory and portion memory
drivers/base/node.c: cleanup node_state_attr[]
bootmem: fix wrong call parameter for free_bootmem()
avr32, kconfig: remove HAVE_ARCH_BOOTMEM
mm: cma: remove watermark hacks
mm: cma: skip watermarks check for already isolated blocks in split_free_page()
mm, oom: fix race when specifying a thread as the oom origin
mm, oom: change type of oom_score_adj to short
mm: cleanup register_node()
mm, mempolicy: remove duplicate code
mm/vmscan.c: try_to_freeze() returns boolean
mm: introduce putback_movable_pages()
virtio_balloon: introduce migration primitives to balloon pages
mm: introduce compaction and migration for ballooned pages
mm: introduce a common interface for balloon pages mobility
mm: redefine address_space.assoc_mapping
mm: adjust address_space_operations.migratepage() return code
arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/
...
|
|
Update the i386 hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the x86-64 cache alignment code to take pgoff into account. Use the
x86 and MIPS cache alignment code as the basis for a generic cache
alignment function.
The old x86 code will always align the mmap to aliasing boundaries,
even if the program mmaps the file with a non-zero pgoff.
If program A mmaps the file with pgoff 0, and program B mmaps the file
with pgoff 1. The old code would align the mmaps, resulting in misaligned
pages:
A: 0123
B: 123
After this patch, they are aligned so the pages line up:
A: 0123
B: 123
Proposed by Rik van Riel.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use
of vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There was some desire in large applications using MAP_HUGETLB or
SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
others. This is useful together with NUMA policy: use 2MB interleaving
on some mappings, but 1GB on local mappings.
This patch extends the IPC/SHM syscall interfaces slightly to allow
specifying the page size.
It borrows some upper bits in the existing flag arguments and allows
encoding the log of the desired page size in addition to the *_HUGETLB
flag. When 0 is specified the default size is used, this makes the
change fully compatible.
Extending the internal hugetlb code to handle this is straight forward.
Instead of a single mount it just keeps an array of them and selects the
right mount based on the specified page size. When no page size is
specified it uses the mount of the default page size.
The change is not visible in /proc/mounts because internal mounts don't
appear there. It also has very little overhead: the additional mounts
just consume a super block, but not more memory when not used.
I also exported the new flags to the user headers (they were previously
under __KERNEL__). Right now only symbols for x86 and some other
architecture for 1GB and 2MB are defined. The interface should already
work for all other architectures though. Only architectures that define
multiple hugetlb sizes actually need it (that is currently x86, tile,
powerpc). However tile and powerpc have user configurable hugetlb
sizes, so it's not easy to add defines. A program on those
architectures would need to query sysfs and use the appropiate log2.
[akpm@linux-foundation.org: cleanups]
[rientjes@google.com: fix build]
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In real mode CS register is writable, so do not #GP on write.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
If enable_unrestricted_guest is true vmx->rmode.vm86_active will
always be false.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
On CPUs without support for unrestricted guests DPL cannot be smaller
than RPL for data segments during guest entry, but this state can occurs
if a data segment selector changes while vcpu is in real mode to a value
with lowest two bits != 00. Fix that by forcing DPL == RPL on transition
to protected mode.
This is a regression introduced by c865c43de66dc97.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull TTY/Serial merge from Greg Kroah-Hartman:
"Here's the big tty/serial tree set of changes for 3.8-rc1.
Contained in here is a bunch more reworks of the tty port layer from
Jiri and bugfixes from Alan, along with a number of other tty and
serial driver updates by the various driver authors.
Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the
TTY layer, which is much appreciated by me.
All of these have been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fixed up some trivial conflicts in the staging tree, due to the fwserial
driver having come in both ways (but fixed up a bit in the serial tree),
and the ioctl handling in the dgrp driver having been done slightly
differently (staging tree got that one right, and removed both
TIOCGSOFTCAR and TIOCSSOFTCAR).
* tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (146 commits)
staging: sb105x: fix potential NULL pointer dereference in mp_chars_in_buffer()
staging/fwserial: Remove superfluous free
staging/fwserial: Use WARN_ONCE when port table is corrupted
staging/fwserial: Destruct embedded tty_port on teardown
staging/fwserial: Fix build breakage when !CONFIG_BUG
staging: fwserial: Add TTY-over-Firewire serial driver
drivers/tty/serial/serial_core.c: clean up HIGH_BITS_OFFSET usage
staging: dgrp: dgrp_tty.c: Audit the return values of get/put_user()
staging: dgrp: dgrp_tty.c: Remove the TIOCSSOFTCAR ioctl handler from dgrp driver
serial: ifx6x60: Add modem power off function in the platform reboot process
serial: mxs-auart: unmap the scatter list before we copy the data
serial: mxs-auart: disable the Receive Timeout Interrupt when DMA is enabled
serial: max310x: Setup missing "can_sleep" field for GPIO
tty/serial: fix ifx6x60.c declaration warning
serial: samsung: add devicetree properties for non-Exynos SoCs
serial: samsung: fix potential soft lockup during uart write
tty: vt: Remove redundant null check before kfree.
tty/8250 Add check for pci_ioremap_bar failure
tty/8250 Add support for Commtech's Fastcom Async-335 and Fastcom Async-PCIe cards
tty/8250 Add XR17D15x devices to the exar_handle_irq override
...
|
|
This removes the sparse warning:
arch/x86/kernel/crash.c:49:32: sparse: incompatible types in comparison expression (different address spaces)
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
- Introduction of device PM QoS flags.
- ACPI device power management update allowing subsystems other than
PCI to use it more easily.
- ACPI device enumeration rework allowing additional kinds of devices
to be enumerated via ACPI. From Mika Westerberg, Adrian Hunter,
Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki.
- ACPICA update to version 20121018 from Bob Moore and Lv Zheng.
- ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu.
- Introduction of acpi_handle_<level>() messaging macros and ACPI-based
CPU hot-remove support from Toshi Kani.
- ACPI EC updates from Feng Tang.
- cpufreq updates from Viresh Kumar, Fabio Baltieri and others.
- cpuidle changes to quickly notice governor prediction failure from
Youquan Song.
- Support for using multiple cpuidle drivers at the same time and
cpuidle cleanups from Daniel Lezcano.
- devfreq updates from Nishanth Menon and others.
- cpupower update from Thomas Renninger.
- Fixes and small cleanups all over the place.
* tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (196 commits)
mmc: sdhci-acpi: enable runtime-pm for device HID INT33C6
ACPI: add Haswell LPSS devices to acpi_platform_device_ids list
ACPI: add documentation about ACPI 5 enumeration
pnpacpi: fix incorrect TEST_ALPHA() test
ACPI / PM: Fix header of acpi_dev_pm_detach() in acpi.h
ACPI / video: ignore BIOS initial backlight value for HP Folio 13-2000
ACPI : do not use Lid and Sleep button for S5 wakeup
ACPI / PNP: Do not crash due to stale pointer use during system resume
ACPI / video: Add "Asus UL30VT" to ACPI video detect blacklist
ACPI: do acpisleep dmi check when CONFIG_ACPI_SLEEP is set
spi / ACPI: add ACPI enumeration support
gpio / ACPI: add ACPI support
PM / devfreq: remove compiler error with module governors (2)
cpupower: IvyBridge (0x3a and 0x3e models) support
cpupower: Provide -c param for cpupower monitor to schedule process on all cores
cpupower tools: Fix warning and a bug with the cpu package count
cpupower tools: Fix malloc of cpu_info structure
cpupower tools: Fix issues with sysfs_topology_read_file
cpupower tools: Fix minor warnings
cpupower tools: Update .gitignore for files created in the debug directories
...
|
|
NOTE: This patch is based on "sched, numa, mm: Add fault driven
placement and migration policy" but as it throws away all the policy
to just leave a basic foundation I had to drop the signed-offs-by.
This patch creates a bare-bones method for setting PTEs pte_numa in the
context of the scheduler that when faulted later will be faulted onto the
node the CPU is running on. In itself this does nothing useful but any
placement policy will fundamentally depend on receiving hints on placement
from fault context and doing something intelligent about it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
|
|
Implement pte_numa and pmd_numa.
We must atomically set the numa bit and clear the present bit to
define a pte_numa or pmd_numa.
Once a pte or pmd has been set as pte_numa or pmd_numa, the next time
a thread touches a virtual address in the corresponding virtual range,
a NUMA hinting page fault will trigger. The NUMA hinting page fault
will clear the NUMA bit and set the present bit again to resolve the
page fault.
The expectation is that a NUMA hinting page fault is used as part
of a placement policy that decides if a page should remain on the
current node or migrated to a different node.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
|
|
The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page
faults to identify the per NUMA node working set of the thread at
runtime.
Arming the NUMA hinting page fault mechanism works similarly to
setting up a mprotect(PROT_NONE) virtual range: the present bit is
cleared at the same time that _PAGE_NUMA is set, so when the fault
triggers we can identify it as a NUMA hinting page fault.
_PAGE_NUMA on x86 shares the same bit number of _PAGE_PROTNONE (but it
could also use a different bitflag, it's up to the architecture to
decide).
It would be confusing to call the "NUMA hinting page faults" as
"do_prot_none faults". They're different events and _PAGE_NUMA doesn't
alter the semantics of mprotect(PROT_NONE) in any way.
Sharing the same bitflag with _PAGE_PROTNONE in fact complicates
things: it requires us to ensure the code paths executed by
_PAGE_PROTNONE remains mutually exclusive to the code paths executed
by _PAGE_NUMA at all times, to avoid _PAGE_NUMA and _PAGE_PROTNONE to
step into each other toes.
Because we want to be able to set this bitflag in any established pte
or pmd (while clearing the present bit at the same time) without
losing information, this bitflag must never be set when the pte and
pmd are present, so the bitflag picked for _PAGE_NUMA usage, must not
be used by the swap entry format.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
|
|
We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that
the pte is associated with a page.
However, for TLB flushing purposes, we would like to know whether the pte
points to an actually accessible page. This allows us to skip remote TLB
flushes for pages that are not actually accessible.
Fill in this method for x86 and provide a safe (but slower) method
on other architectures.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixed-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org
[ Added Linus's review fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Intel has an architectural guarantee that the TLB entry causing
a page fault gets invalidated automatically. This means
we should be able to drop the local TLB invalidation.
Because of the way other areas of the page fault code work,
chances are good that all x86 CPUs do this. However, if
someone somewhere has an x86 CPU that does not invalidate
the TLB entry causing a page fault, this one-liner should
be easy to revert.
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
|
|
The function ptep_set_access_flags() is only ever invoked to set access
flags or add write permission on a PTE. The write bit is only ever set
together with the dirty bit.
Because we only ever upgrade a PTE, it is safe to skip flushing entries on
remote TLBs. The worst that can happen is a spurious page fault on other
CPUs, which would flush that TLB entry.
Lazily letting another CPU incur a spurious page fault occasionally is
(much!) cheaper than aggressively flushing everybody else's TLB.
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
|
|
* pci/mjg-pci-roms-from-efi:
PCI: Use phys_addr_t for physical ROM address
|
|
The original version code causes following sparse warnings:
arch/x86/lib/inat-tables.c:1080:25: warning: duplicate const
arch/x86/lib/inat-tables.c:1095:25: warning: duplicate const
arch/x86/lib/inat-tables.c:1118:25: warning: duplicate const
for the variables inat_escape_tables, inat_group_tables, and inat_avx_tables
in the code generated by gen-insn-attr-x86.awk.
The author Masami Hiramutsu says here is to make both the value pointed by the
pointers and the pointers itself read-only, so we move the "const" to be after
the "*".
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Link: http://lkml.kernel.org/r/20121209082103.GA9181@gmail.com
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Use phys_addr_t rather than "void *" for physical memory address.
This removes casts and fixes a "cast from pointer to integer of different
size" warning on ppc44x_defconfig.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
Pull ftrace updates from Steve Rostedt.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Pull uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
tools/perf/Makefile
tools/perf/builtin-test.c
tools/perf/perf.h
tools/perf/tests/parse-events.c
tools/perf/util/evsel.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
* pci/daniel-numachip:
x86/PCI: Add NumaChip remote PCI support
|
|
Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but
preventing access to AMD Northbridges which shouldn't respond.
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
* pci/mjg-pci-roms-from-efi:
x86: Use PCI setup data
PCI: Add support for non-BAR ROMs
PCI: Add pcibios_add_device
EFI: Stash ROMs if they're not in the PCI BAR
|
|
The vmclear function will be assigned to the callback function pointer
when loading kvm-intel module. And the bitmap indicates whether we
should do VMCLEAR operation in kdump. The bits in the bitmap are
set/unset according to different conditions.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
This patch provides a way to VMCLEAR VMCSs related to guests
on all cpus before executing the VMXOFF when doing kdump. This
is used to ensure the VMCSs in the vmcore updated and
non-corrupted.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
I've legally changed my name with New York State, the US Social Security
Administration, et al. This patch propagates the name change and change
in initials and login to comments in the kernel source as well.
Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
CAST5 and CAST6 both use same lookup tables, which can be moved shared module
'cast_common'.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
There are two cases we need to adjust page size in set_spte:
1): the one is other vcpu creates new sp in the window between mapping_level()
and acquiring mmu-lock.
2): the another case is the new sp is created by itself (page-fault path) when
guest uses the target gfn as its page table.
In current code, set_spte drop the spte and emulate the access for these case,
it works not good:
- for the case 1, it may destroy the mapping established by other vcpu, and
do expensive instruction emulation.
- for the case 2, it may emulate the access even if the guest is accessing
the page which not used as page table. There is a example, 0~2M is used as
huge page in guest, in this huge page, only page 3 used as page table, then
guest read/writes on other pages can cause instruction emulation.
Both of these cases can be fixed by allowing guest to retry the access, it
will refault, then we can establish the mapping by using small page
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
EFI can provide PCI ROMs out of band via boot services, which may not be
available after boot. Add support for using the data handed off to us by
the boot stub or bootloader.
[bhelgaas: added Seth's boot_params section mismatch fix]
[bhelgaas: drop "boot_params.hdr.version < 0x0209" test]
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
|
|
EFI provides support for providing PCI ROMs via means other than the ROM
BAR. This support vanishes after we've exited boot services, so add support
for stashing copies of the ROMs in setup_data if they're not otherwise
available.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
|
|
VMX behaves now as SVM wrt to FPU initialization. Code has been moved to
generic code path. General-purpose registers are now cleared on reset and
INIT. SVM code properly initializes EDX.
Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability
reporting, so remove it from KVM to avoid conflicts in future.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so remove it.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Conflicts:
arch/x86/kernel/ptrace.c
Pull the latest RCU tree from Paul E. McKenney:
" The major features of this series are:
1. A first version of no-callbacks CPUs. This version prohibits
offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y.
Relaxing this constraint is in progress, but not yet ready
for prime time. These commits were posted to LKML at
https://lkml.org/lkml/2012/10/30/724, and are at branch rcu/nocb.
2. Changes to SRCU that allows statically initialized srcu_struct
structures. These commits were posted to LKML at
https://lkml.org/lkml/2012/10/30/296, and are at branch rcu/srcu.
3. Restructuring of RCU's debugfs output. These commits were posted
to LKML at https://lkml.org/lkml/2012/10/30/341, and are at
branch rcu/tracing.
4. Additional CPU-hotplug/RCU improvements, posted to LKML at
https://lkml.org/lkml/2012/10/30/327, and are at branch rcu/hotplug.
Note that the commit eliminating __stop_machine() was judged to
be too-high of risk, so is deferred to 3.9.
5. Changes to RCU's idle interface, most notably a new module
parameter that redirects normal grace-period operations to
their expedited equivalents. These were posted to LKML at
https://lkml.org/lkml/2012/10/30/739, and are at branch rcu/idle.
6. Additional diagnostics for RCU's CPU stall warning facility,
posted to LKML at https://lkml.org/lkml/2012/10/30/315, and
are at branch rcu/stall. The most notable change reduces the
default RCU CPU stall-warning time from 60 seconds to 21 seconds,
so that it once again happens sooner than the softlockup timeout.
7. Documentation updates, which were posted to LKML at
https://lkml.org/lkml/2012/10/30/280, and are at branch rcu/doc.
A couple of late-breaking changes were posted at
https://lkml.org/lkml/2012/11/16/634 and
https://lkml.org/lkml/2012/11/16/547.
8. Miscellaneous fixes, which were posted to LKML at
https://lkml.org/lkml/2012/10/30/309, along with a late-breaking
change posted at Fri, 16 Nov 2012 11:26:25 -0800 with message-ID
<20121116192625.GA447@linux.vnet.ibm.com>, but which lkml.org
seems to have missed. These are at branch rcu/fixes.
9. Finally, a fix for an lockdep-RCU splat was posted to LKML
at https://lkml.org/lkml/2012/11/7/486. This is at rcu/next. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|