summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/setup.c
AgeCommit message (Collapse)Author
2013-01-29x86: Merge early_reserve_initrd for 32bit and 64bitYinghai Lu
They are the same, could move them out from head32/64.c to setup.c. We are using memblock, and it could handle overlapping properly, so we don't need to reserve some at first to hold the location, and just need to make sure we reserve them before we are using memblock to find free mem to use. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-15-git-send-email-yinghai@kernel.org Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, 64bit: Don't set max_pfn_mapped wrong value early on native pathYinghai Lu
We are not having max_pfn_mapped set correctly until init_memory_mapping. So don't print its initial value for 64bit Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup. -v2: update comments about max_pfn_mapped according to Stefano Stabellini. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-14-git-send-email-yinghai@kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, 64bit: Use a #PF handler to materialize early mappings on demandH. Peter Anvin
Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all 64-bit code has to use page tables. This makes it awkward before we have first set up properly all-covering page tables to access objects that are outside the static kernel range. So far we have dealt with that simply by mapping a fixed amount of low memory, but that fails in at least two upcoming use cases: 1. We will support load and run kernel, struct boot_params, ramdisk, command line, etc. above the 4 GiB mark. 2. need to access ramdisk early to get microcode to update that as early possible. We could use early_iomap to access them too, but it will make code to messy and hard to be unified with 32 bit. Hence, set up a #PF table and use a fixed number of buffers to set up page tables on demand. If the buffers fill up then we simply flush them and start over. These buffers are all in __initdata, so it does not increase RAM usage at runtime. Thus, with the help of the #PF handler, we can set the final kernel mapping from blank, and switch to init_level4_pgt later. During the switchover in head_64.S, before #PF handler is available, we use three pages to handle kernel crossing 1G, 512G boundaries with sharing page by playing games with page aliasing: the same page is mapped twice in the higher-level tables with appropriate wraparound. The kernel region itself will be properly mapped; other mappings may be spurious. early_make_pgtable is using kernel high mapping address to access pages to set page table. -v4: Add phys_base offset to make kexec happy, and add init_mapping_kernel() - Yinghai -v5: fix compiling with xen, and add back ident level3 and level2 for xen also move back init_level4_pgt from BSS to DATA again. because we have to clear it anyway. - Yinghai -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai -v7: remove not needed clear_page for init_level4_page it is with fill 512,8,0 already in head_64.S - Yinghai -v8: we need to keep that handler alive until init_mem_mapping and don't let early_trap_init to trash that early #PF handler. So split early_trap_pf_init out and move it down. - Yinghai -v9: switchover only cover kernel space instead of 1G so could avoid touch possible mem holes. - Yinghai -v11: change far jmp back to far return to initial_code, that is needed to fix failure that is reported by Konrad on AMD systems. - Yinghai Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, realmode: Separate real_mode reserve and setupYinghai Lu
After we switch to use #PF handler help to set page table, init_level4_pgt will only have entries set after init_mem_mapping(). We need to move copying init_level4_pgt to trampoline_pgd after that. So split reserve and setup, and move the setup after init_mem_mapping() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-11-git-send-email-yinghai@kernel.org Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86: Factor out e820_add_kernel_range()Yinghai Lu
Separate out the reservation of the kernel static memory areas into a separate function. Also add support for case when memmap=xxM$yyM is used without exactmap. Need to remove reserved range at first before we add E820_RAM range, otherwise added E820_RAM range will be ignored. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-5-git-send-email-yinghai@kernel.org Cc: Jacob Shin <jacob.shin@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29Merge remote-tracking branch 'origin/x86/boot' into x86/mm2H. Peter Anvin
Coming patches to x86/mm2 require the changes and advanced baseline in x86/boot. Resolved Conflicts: arch/x86/kernel/setup.c mm/nobootmem.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25Merge tag 'v3.8-rc5' into x86/mmH. Peter Anvin
The __pa() fixup series that follows touches KVM code that is not present in the existing branch based on v3.7-rc5, so merge in the current upstream from Linus. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-13x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCIH. Peter Anvin
early_pci_allowed() and read_pci_config_16() are only available if CONFIG_PCI is defined. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2013-01-13x86/Sandy Bridge: mark arrays in __init functions as __initconstH. Peter Anvin
Mark static arrays as __initconst so they get removed when the init sections are flushed. Reported-by: Mathias Krause <minipli@googlemail.com> Link: http://lkml.kernel.org/r/75F4BEE6-CB0E-4426-B40B-697451677738@googlemail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-11x86/Sandy Bridge: reserve pages when integrated graphics is presentJesse Barnes
SNB graphics devices have a bug that prevent them from accessing certain memory ranges, namely anything below 1M and in the pages listed in the table. So reserve those at boot if set detect a SNB gfx device on the CPU to avoid GPU hangs. Stephane Marchesin had a similar patch to the page allocator awhile back, but rather than reserving pages up front, it leaked them at allocation time. [ hpa: made a number of stylistic changes, marked arrays as static const, and made less verbose; use "memblock=debug" for full verbosity. ] Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-14Merge branch 'x86-acpi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 ACPI update from Peter Anvin: "This is a patchset which didn't make the last merge window. It adds a debugging capability to feed ACPI tables via the initramfs. On a grander scope, it formalizes using the initramfs protocol for feeding arbitrary blobs which need to be accessed early to the kernel: they are fed first in the initramfs blob (lots of bootloaders can concatenate this at boot time, others can use a single file) in an uncompressed cpio archive using filenames starting with "kernel/". The ACPI maintainers requested that this patchset be fed via the x86 tree rather than the ACPI tree as the footprint in the general x86 code is much bigger than in the ACPI code proper." * 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: X86 ACPI: Use #ifdef not #if for CONFIG_X86 check ACPI: Fix build when disabled ACPI: Document ACPI table overriding via initrd ACPI: Create acpi_table_taint() function to avoid code duplication ACPI: Implement physical address table override ACPI: Store valid ACPI tables passed via early initrd in reserved memblock areas x86, acpi: Introduce x86 arch specific arch_reserve_mem_area() for e820 handling lib: Add early cpio decoder
2012-12-05x86: Use PCI setup dataMatthew Garrett
EFI can provide PCI ROMs out of band via boot services, which may not be available after boot. Add support for using the data handed off to us by the boot stub or bootloader. [bhelgaas: added Seth's boot_params section mismatch fix] [bhelgaas: drop "boot_params.hdr.version < 0x0209" test] Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Seth Forshee <seth.forshee@canonical.com>
2012-11-17x86, mm: kill numa_64.hYinghai Lu
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-44-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: Move init_gbpages() out of setup.cYinghai Lu
Put it in mm/init.c, and call it from probe_page_mask(). init_mem_mapping is calling probe_page_mask at first. So calling sequence is not changed. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-32-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: Move min_pfn_mapped back to mm/init.cYinghai Lu
Also change it to static. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: setup page table in top-downYinghai Lu
Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first. Then use mapped pages to map more ranges below, and keep looping until all pages get mapped. alloc_low_page will use page from BRK at first, after that buffer is used up, will use memblock to find and reserve pages for page table usage. Introduce min_pfn_mapped to make sure find new pages from mapped ranges, that will be updated when lower pages get mapped. Also add step_size to make sure that don't try to map too big range with limited mapped pages initially, and increase the step_size when we have more mapped pages on hand. We don't need to call pagetable_reserve anymore, reserve work is done in alloc_low_page() directly. At last we can get rid of calculation and find early pgt related code. -v2: update to after fix_xen change, also use MACRO for initial pgt_buf size and add comments with it. -v3: skip big reserved range in memblock.reserved near end. -v4: don't need fix_xen change now. -v5: add changelog about moving about reserving pagetable to alloc_low_page. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-22-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: relocate initrd under all mem for 64bitYinghai Lu
instead of under 4g. For 64bit, we can use any mapped mem instead of low mem. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-17-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: Only direct map addresses that are marked as E820_RAMJacob Shin
Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT ) and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not backed by actual DRAM. This is fine for holes under 4GB which are covered by fixed and variable range MTRRs to be UC. However, we run into trouble on higher memory addresses which cannot be covered by MTRRs. Our system with 1TB of RAM has an e820 that looks like this: BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable and so direct mappings are created for huge memory hole between 0x000000e038000000 to 0x0000010000000000. Even though the kernel never generates memory accesses in that region, since the page tables mark them incorrectly as being WB, our (AMD) processor ends up causing a MCE while doing some memory bookkeeping/optimizations around that area. This patch iterates through e820 and only direct maps ranges that are marked as E820_RAM, and keeps track of those pfn ranges. Depending on the alignment of E820 ranges, this may possibly result in using smaller size (i.e. 4K instead of 2M or 1G) page tables. -v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range instead. - Yinghai Lu -v3: add calculate_all_table_space_size() to get correct needed page table size. - Yinghai Lu -v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when mem map does have hole under 4g that is found by Konard on xen domU with 8g ram. - Yinghai Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.org Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: use pfn_range_is_mapped() with reserve_initrdYinghai Lu
We are going to map ram only, so under max_low_pfn_mapped, between 4g and max_pfn_mapped does not mean mapped at all. Use pfn_range_is_mapped() to find out if range is mapped for initrd. That could happen bootloader put initrd in range but user could use memmap to carve some of range out. Also during copying need to use early_memmap to map original initrd for accessing. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-15-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: if kernel .text .data .bss are not marked as E820_RAM, complain and fixJacob Shin
There could be cases where user supplied memmap=exactmap memory mappings do not mark the region where the kernel .text .data and .bss reside as E820_RAM, as reported here: https://lkml.org/lkml/2012/8/14/86 Handle it by complaining, and adding the range back into the e820. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1353123563-3103-11-git-send-email-yinghai@kernel.org Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: Set memblock initial limit to 1MYinghai Lu
memblock_x86_fill() could double memory array. If we set memblock.current_limit to 512M, so memory array could be around 512M. So kdump will not get big range (like 512M) under 1024M. Try to put it down under 1M, it would use about 4k or so, and that is limited. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-10-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: Move init_memory_mapping calling out of setup.cYinghai Lu
Now init_memory_mapping is called two times, later will be called for every ram ranges. Could put all related init_mem calling together and out of setup.c. Actually, it reverts commit 1bbbbe7 x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping. will address that later with complete solution include handling hole under 4g. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-5-git-send-email-yinghai@kernel.org Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17x86, mm: Add global page_size_mask and probe one time onlyYinghai Lu
Now we pass around use_gbpages and use_pse for calculating page table size, Later we will need to call init_memory_mapping for every ram range one by one, that mean those calculation will be done several times. Those information are the same for all ram range and could be stored in page_size_mask and could be probed it one time only. Move that probing code out of init_memory_mapping into separated function probe_page_size_mask(), and call it before all init_memory_mapping. Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-2-git-send-email-yinghai@kernel.org Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-16x86: Use __pa_symbol instead of __pa on C visible symbolsAlexander Duyck
When I made an attempt at separating __pa_symbol and __pa I found that there were a number of cases where __pa was used on an obvious symbol. I also caught one non-obvious case as _brk_start and _brk_end are based on the address of __brk_base which is a C visible symbol. In mark_rodata_ro I was able to reduce the overhead of kernel symbol to virtual memory translation by using a combination of __va(__pa_symbol()) instead of page_address(virt_to_page()). Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Link: http://lkml.kernel.org/r/20121116215640.8521.80483.stgit@ahduyck-cp1.jf.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-26Merge tag 'efi-for-3.7' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: "Fix oops with EFI variables on mixed 32/64-bit firmware/kernels and document EFI git repository location on kernel.org." Conflicts: arch/x86/include/asm/efi.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-25x86: efi: Turn off efi_enabled after setup on mixed fw/kernelOlof Johansson
When 32-bit EFI is used with 64-bit kernel (or vice versa), turn off efi_enabled once setup is done. Beyond setup, it is normally used to determine if runtime services are available and we will have none. This will resolve issues stemming from efivars modprobe panicking on a 32/64-bit setup, as well as some reboot issues on similar setups. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=45991 Reported-by: Marko Kohtala <marko.kohtala@gmail.com> Reported-by: Maxim Kammerer <mk@dee.su> Signed-off-by: Olof Johansson <olof@lixom.net> Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: stable@kernel.org # 3.4 - 3.6 Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-10-24x86, mm: Use memblock memory loop instead of e820_RAMYinghai Lu
We need to handle E820_RAM and E820_RESERVED_KERNEL at the same time. Also memblock has page aligned range for ram, so we could avoid mapping partial pages. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com Acked-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>
2012-10-19Merge commit '5bc66170dc486556a1e36fd384463536573f4b82' into x86/urgentH. Peter Anvin
From Borislav Petkov <bp@amd64.org>: Below is a RAS fix which reverts the addition of a sysfs attribute which we agreed is not needed, post-factum. And this should go in now because that sysfs attribute is going to end up in 3.7 otherwise and thus exposed to userspace; removing it then would be a lot harder. This is done as a merge rather than a simple patch/cherry-pick since the baseline for this patch was not in the previous x86/urgent. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-17x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct ↵Jacob Shin
mapping. On systems with very large memory (1 TB in our case), BIOS may report a reserved region or a hole in the E820 map, even above the 4 GB range. Exclude these from the direct mapping. [ hpa: this should be done not just for > 4 GB but for everything above the legacy region (1 MB), at the very least. That, however, turns out to require significant restructuring. That work is well underway, but is not suitable for rc/stable. ] Cc: stable@kernel.org # > 2.6.32 Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-12Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core update from Thomas Gleixner: - Bug fixes (one for a longstanding dead loop issue) - Rework of time related vsyscalls - Alarm timer updates - Jiffies updates to remove compile time dependencies * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Cast raw_interval to u64 to avoid shift overflow timers: Fix endless looping between cascade() and internal_add_timer() time/jiffies: bring back unconditional LATCH definition time: Convert x86_64 to using new update_vsyscall time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems time: Introduce new GENERIC_TIME_VSYSCALL time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD time: Move update_vsyscall definitions to timekeeper_internal.h time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes jiffies: Remove compile time assumptions about CLOCK_TICK_RATE jiffies: Kill unused TICK_USEC_TO_NSEC alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue alarmtimer: Remove unused helpers & defines alarmtimer: Use hrtimer per-alarm instead of per-base alarmtimer: Implement minimum alarm interval for allowing suspend
2012-10-04Merge tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Avi Kivity: "Highlights of the changes for this release include support for vfio level triggered interrupts, improved big real mode support on older Intels, a streamlines guest page table walker, guest APIC speedups, PIO optimizations, better overcommit handling, and read-only memory." * tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits) KVM: s390: Fix vcpu_load handling in interrupt code KVM: x86: Fix guest debug across vcpu INIT reset KVM: Add resampling irqfds for level triggered interrupts KVM: optimize apic interrupt delivery KVM: MMU: Eliminate pointless temporary 'ac' KVM: MMU: Avoid access/dirty update loop if all is well KVM: MMU: Eliminate eperm temporary KVM: MMU: Optimize is_last_gpte() KVM: MMU: Simplify walk_addr_generic() loop KVM: MMU: Optimize pte permission checks KVM: MMU: Update accessed and dirty bits after guest pagetable walk KVM: MMU: Move gpte_access() out of paging_tmpl.h KVM: MMU: Optimize gpte_access() slightly KVM: MMU: Push clean gpte write protection out of gpte_access() KVM: clarify kvmclock documentation KVM: make processes waiting on vcpu mutex killable KVM: SVM: Make use of asm.h KVM: VMX: Make use of asm.h KVM: VMX: Make lto-friendly KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() ... Conflicts: arch/s390/include/asm/processor.h arch/x86/kvm/i8259.c
2012-10-01ACPI: Fix build when disabledDavid Rientjes
"ACPI: Store valid ACPI tables passed via early initrd in reserved memblock areas" breaks the build if either CONFIG_ACPI or CONFIG_BLK_DEV_INITRD is disabled: arch/x86/kernel/setup.c: In function 'setup_arch': arch/x86/kernel/setup.c:944: error: implicit declaration of function 'acpi_initrd_override' or arch/x86/built-in.o: In function `setup_arch': (.init.text+0x1397): undefined reference to `initrd_start' arch/x86/built-in.o: In function `setup_arch': (.init.text+0x139e): undefined reference to `initrd_end' The dummy acpi_initrd_override() function in acpi.h isn't defined without CONFIG_ACPI and initrd_{start,end} are declared but not defined without CONFIG_BLK_DEV_INITRD. [ hpa: applying this as a fix, but this really should be done cleaner ] Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1210012032470.31644@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Len Brown <lenb@kernel.org>
2012-09-30ACPI: Store valid ACPI tables passed via early initrd in reserved memblock areasThomas Renninger
A later patch will compare them with ACPI tables that get loaded at boot or runtime and if criteria match, a stored one is loaded. Signed-off-by: Thomas Renninger <trenn@suse.de> Link: http://lkml.kernel.org/r/1349043837-22659-4-git-send-email-trenn@suse.de Cc: Len Brown <lenb@kernel.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Piel <eric.piel@tremplin-utc.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-09-24jiffies: Remove compile time assumptions about CLOCK_TICK_RATEJohn Stultz
CLOCK_TICK_RATE is used to accurately caclulate exactly how a tick will be at a given HZ. This is useful, because while we'd expect NSEC_PER_SEC/HZ, the underlying hardware will have some granularity limit, so we won't be able to have exactly HZ ticks per second. This slight error can cause timekeeping quality problems when using the jiffies or other jiffies driven clocksources. Thus we currently use compile time CLOCK_TICK_RATE value to generate SHIFTED_HZ and NSEC_PER_JIFFIES, which we then use to adjust the jiffies clocksource to correct this error. Unfortunately though, since CLOCK_TICK_RATE is a compile time value, and the jiffies clocksource is registered very early during boot, there are a number of cases where there are different possible hardware timers that have different tick rates. This causes problems in cases like ARM where there are numerous different types of hardware, each having their own compile-time CLOCK_TICK_RATE, making it hard to accurately support different hardware with a single kernel. For the most part, this doesn't matter all that much, as not too many systems actually utilize the jiffies or jiffies driven clocksource. Usually there are other highres clocksources who's granularity error is negligable. Even so, we have some complicated calcualtions that we do everywhere to handle these edge cases. This patch removes the compile time SHIFTED_HZ value, and introduces a register_refined_jiffies() function. This results in the default jiffies clock as being assumed a perfect HZ freq, and allows archtectures that care about jiffies accuracy to call register_refined_jiffies() with the tick rate, specified dynamically at boot. This allows us, where necessary, to not have a compile time CLOCK_TICK_RATE constant, simplifies the jiffies code, and still provides a way to have an accurate jiffies clock. NOTE: Since this patch does not add register_refinied_jiffies() calls for every arch, it may cause time quality regressions in some cases. Its likely these will not be noticable, but if they are an issue, adding the following to the end of setup_arch() should resolve the regression: register_refinied_jiffies(CLOCK_TICK_RATE) Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-09-12x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done()Attilio Rao
At this stage x86_init.paging.pagetable_setup_done is only used in the XEN case. Move its content in the x86_init.paging.pagetable_init setup function and remove the now unused x86_init.paging.pagetable_setup_done remaining infrastructure. Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Acked-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-5-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-09-12x86: Move paging_init() call to x86_init.paging.pagetable_init()Attilio Rao
Move the paging_init() call to the platform specific pagetable_init() function, so we can get rid of the extra pagetable_setup_done() function pointer. Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Acked-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-4-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-09-12x86: Rename pagetable_setup_start() to pagetable_init()Attilio Rao
In preparation for unifying the pagetable_setup_start() and pagetable_setup_done() setup functions, rename appropriately all the infrastructure related to pagetable_setup_start(). Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Ackedd-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-3-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-09-12x86: Remove base argument from x86_init.paging.pagetable_setup_startAttilio Rao
We either use swapper_pg_dir or the argument is unused. Preparatory patch to simplify platform pagetable setup further. Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Ackedb-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-2-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-23x86: KVM guest: merge CONFIG_KVM_CLOCK into CONFIG_KVM_GUESTMarcelo Tosatti
The distinction between CONFIG_KVM_CLOCK and CONFIG_KVM_GUEST is not so clear anymore, as demonstrated by recent bugs caused by poor handling of on/off combinations of these options. Merge CONFIG_KVM_CLOCK into CONFIG_KVM_GUEST. Reported-By: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-06-15Merge branch 'x86/cleanups' into x86/apicIngo Molnar
Merge in the cleanups because a followup x86/apic change relies on them. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-29Merge branch 'x86-trampoline-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 trampoline rework from H. Peter Anvin: "This code reworks all the "trampoline"/"realmode" code (various bits that need to live in the first megabyte of memory, most but not all of which runs in real mode at some point) in the kernel into a single object. The main reason for doing this is that it eliminates the last place in the kernel where we needed pages to be mapped RWX. This code separates all that code into proper R/RW/RX pages." Fix up conflicts in arch/x86/kernel/Makefile (mca removed next to reboot code), and arch/x86/kernel/reboot.c (reboot code moved around in one branch, modified in this one), and arch/x86/tools/relocs.c (mostly same code came in earlier due to working around the ld bugs just before the 3.4 release). Also remove stale x86-relocs entry from scripts/.gitignore as per Peter Anvin. * commit '61f5446169046c217a5479517edac3a890c3bee7': (36 commits) x86, realmode: Move end signature into header.S x86, relocs: When printing an error, say relative or absolute x86, relocs: More relocations which may end up as absolute x86, relocs: Workaround for binutils 2.22.52.0.1 section bug xen-acpi-processor: Add missing #include <xen/xen.h> acpi, bgrd: Add missing <linux/io.h> to drivers/acpi/bgrt.c x86, realmode: Change EFER to a single u64 field x86, realmode: Move kernel/realmode.c to realmode/init.c x86, realmode: Move not-common bits out of trampoline_common.S x86, realmode: Mask out EFER.LMA when saving trampoline EFER x86, realmode: Fix no cache bits test in reboot_32.S x86, realmode: Make sure all generated files are listed in targets x86, realmode: build fix: remove duplicate build x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline x86, realmode: fixes compilation issue in tboot.c x86, realmode: move relocs from scripts/ to arch/x86/tools x86, realmode: header for trampoline code x86, realmode: flattened rm hierachy x86, realmode: don't copy real_mode_header x86, realmode: fix 64-bit wakeup sequence ...
2012-05-29x86: print physical addresses consistently with other parts of kernelBjorn Helgaas
Print physical address info in a style consistent with the %pR style used elsewhere in the kernel. For example: -found SMP MP-table at [ffff8800000fce90] fce90 +found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90] -initial memory mapped : 0 - 20000000 +initial memory mapped: [mem 0x00000000-0x1fffffff] -Base memory trampoline at [ffff88000009c000] 9c000 size 8192 +Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000] -SRAT: Node 0 PXM 0 0-80000000 +SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull CMA and ARM DMA-mapping updates from Marek Szyprowski: "These patches contain two major updates for DMA mapping subsystem (mainly for ARM architecture). First one is Contiguous Memory Allocator (CMA) which makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted. The main difference from the similar frameworks is the fact that CMA allows to transparently reuse the memory region reserved for the big chunk allocation as a system memory, so no memory is wasted when no big chunk is allocated. Once the alloc request is issued, the framework migrates system pages to create space for the required big chunk of physically contiguous memory. For more information one can refer to nice LWN articles: - 'A reworked contiguous memory allocator': http://lwn.net/Articles/447405/ - 'CMA and ARM': http://lwn.net/Articles/450286/ - 'A deep dive into CMA': http://lwn.net/Articles/486301/ - and the following thread with the patches and links to all previous versions: https://lkml.org/lkml/2012/4/3/204 The main client for this new framework is ARM DMA-mapping subsystem. The second part provides a complete redesign in ARM DMA-mapping subsystem. The core implementation has been changed to use common struct dma_map_ops based infrastructure with the recent updates for new dma attributes merged in v3.4-rc2. This allows to use more than one implementation of dma-mapping calls and change/select them on the struct device basis. The first client of this new infractructure is dmabounce implementation which has been completely cut out of the core, common code. The last patch of this redesign update introduces a new, experimental implementation of dma-mapping calls on top of generic IOMMU framework. This lets ARM sub-platform to transparently use IOMMU for DMA-mapping calls if one provides required IOMMU hardware. For more information please refer to the following thread: http://www.spinics.net/lists/arm-kernel/msg175729.html The last patch merges changes from both updates and provides a resolution for the conflicts which cannot be avoided when patches have been applied on the same files (mainly arch/arm/mm/dma-mapping.c)." Acked by Andrew Morton <akpm@linux-foundation.org>: "Yup, this one please. It's had much work, plenty of review and I think even Russell is happy with it." * 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: (28 commits) ARM: dma-mapping: use PMD size for section unmap cma: fix migration mode ARM: integrate CMA with DMA-mapping subsystem X86: integrate CMA with DMA-mapping subsystem drivers: add Contiguous Memory Allocator mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks mm: extract reclaim code from __alloc_pages_direct_reclaim() mm: Serialize access to min_free_kbytes mm: page_isolation: MIGRATE_CMA isolation functions added mm: mmzone: MIGRATE_CMA migration type added mm: page_alloc: change fallbacks array handling mm: page_alloc: introduce alloc_contig_range() mm: compaction: export some of the functions mm: compaction: introduce isolate_freepages_range() mm: compaction: introduce map_pages() mm: compaction: introduce isolate_migratepages_range() mm: page_alloc: remove trailing whitespace ARM: dma-mapping: add support for IOMMU mapper ARM: dma-mapping: use alloc, mmap, free from dma_ops ARM: dma-mapping: remove redundant code and do the cleanup ... Conflicts: arch/x86/include/asm/dma-mapping.h
2012-05-24Revert "x86/platform: Add a wallclock_init func to x86_platforms ops"Feng Tang
This reverts commit cf8ff6b6ab0e99dd3058852f4ec76a6140abadec. Just found this commit is a function duplicatation of commit 6b617e22 "x86/platform: Add a wallclock_init func to x86_init.timers ops". Let's revert it and sorry for the noise. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-23Merge branch 'delete-mca' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull the MCA deletion branch from Paul Gortmaker: "It was good that we could support MCA machines back in the day, but realistically, nobody is using them anymore. They were mostly limited to 386-sx 16MHz CPU and some 486 class machines and never more than 64MB of RAM. Even the enthusiast hobbyist community seems to have dried up close to ten years ago, based on what you can find searching various websites dedicated to the relatively short lived hardware. So lets remove the support relating to CONFIG_MCA. There is no point carrying this forward, wasting cycles doing routine maintenance on it; wasting allyesconfig build time on validating it, wasting I/O on git grep'ping over it, and so on." Let's see if anybody screams. It generally has compiled, and James Bottomley pointed out that there was a MCA extension from NCR that allowed for up to 4GB of memory and PPro-class machines. So in *theory* there may be users out there. But even James (technically listed as a maintainer) doesn't actually have a system, and while Alan Cox claims to have a machine in his cellar that he offered to anybody who wants to take it off his hands, he didn't argue for keeping MCA support either. So we could bring it back. But somebody had better speak up and talk about how they have actually been using said MCA hardware with modern kernels for us to do that. And David already took the patch to delete all the networking driver code (commit a5e371f61ad3: "drivers/net: delete all code/drivers depending on CONFIG_MCA"). * 'delete-mca' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: MCA: delete all remaining traces of microchannel bus support. scsi: delete the MCA specific drivers and driver code serial: delete the MCA specific 8250 support. arm: remove ability to select CONFIG_MCA
2012-05-23Merge branches 'x86-asm-for-linus', 'x86-cleanups-for-linus', ↵Linus Torvalds
'x86-cpu-for-linus', 'x86-debug-for-linus' and 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull initial trivial x86 stuff from Ingo Molnar. Various random cleanups and trivial fixes. * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Eliminate dead ia32 syscall handlers * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pci-calgary_64.c: Remove obsoleted simple_strtoul() usage x86: Don't continue booting if we can't load the specified initrd x86: kernel/dumpstack.c simple_strtoul cleanup x86: kernel/check.c simple_strtoul cleanup debug: Add CONFIG_READABLE_ASM x86: spinlock.h: Remove REG_PTR_MODE * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cache_info: Fix setup of l2/l3 ids * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Avoid double stack traces with show_regs() * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: microcode_core.c simple_strtoul cleanup
2012-05-21X86: integrate CMA with DMA-mapping subsystemMarek Szyprowski
This patch adds support for CMA to dma-mapping subsystem for x86 architecture that uses common pci-dma/pci-nommu implementation. This allows to test CMA on KVM/QEMU and a lot of common x86 boxes. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> CC: Michal Nazarewicz <mina86@mina86.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
2012-05-17MCA: delete all remaining traces of microchannel bus support.Paul Gortmaker
Hardware with MCA bus is limited to 386 and 486 class machines that are now 20+ years old and typically with less than 32MB of memory. A quick search on the internet, and you see that even the MCA hobbyist/enthusiast community has lost interest in the early 2000 era and never really even moved ahead from the 2.4 kernels to the 2.6 series. This deletes anything remaining related to CONFIG_MCA from core kernel code and from the x86 architecture. There is no point in carrying this any further into the future. One complication to watch for is inadvertently scooping up stuff relating to machine check, since there is overlap in the TLA name space (e.g. arch/x86/boot/mca.c). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: James Bottomley <JBottomley@Parallels.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-16x86: Don't continue booting if we can't load the specified initrdPeter Jones
If we've determined we can't do what the user asked, trying to do something else isn't going to make the user's life better. Without this the screen scrolls a bit and then you get a panic anyway, and it's nice not to have so much scroll after the real problem in bug reports. Link: http://lkml.kernel.org/r/1337190206-12121-1-git-send-email-pjones@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-08x86, realmode: read cr4 and EFER from kernel for 64-bit trampolineJarkko Sakkinen
This patch changes 64-bit trampoline so that CR4 and EFER are provided by the kernel instead of using fixed values. Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Link: http://lkml.kernel.org/r/1336501366-28617-24-git-send-email-jarkko.sakkinen@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>