summaryrefslogtreecommitdiff
path: root/arch/x86/xen
AgeCommit message (Collapse)Author
2012-07-19xen/x86: avoid updating TLS descriptors if they haven't changedDavid Vrabel
When switching tasks in a Xen PV guest, avoid updating the TLS descriptors if they haven't changed. This improves the speed of context switches by almost 10% as much of the time the descriptors are the same or only one is different. The descriptors written into the GDT by Xen are modified from the values passed in the update_descriptor hypercall so we keep shadow copies of the three TLS descriptors to compare against. lmbench3 test Before After Improvement -------------------------------------------- lat_ctx -s 32 24 7.19 6.52 9% lat_pipe 12.56 11.66 7% Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/x86: add desc_equal() to compare GDT descriptorsDavid Vrabel
Signed-off-by: David Vrabel <david.vrabel@citrix.com> [v1: Moving it to the Xen file] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mm: zero PTEs for non-present MFNs in the initial page tableDavid Vrabel
When constructing the initial page tables, if the MFN for a usable PFN is missing in the p2m then that frame is initially ballooned out. In this case, zero the PTE (as in decrease_reservation() in drivers/xen/balloon.c). This is obviously safe instead of having an valid PTE with an MFN of INVALID_P2M_ENTRY (~0). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mm: do direct hypercall in xen_set_pte() if batching is unavailableDavid Vrabel
In xen_set_pte() if batching is unavailable (because the caller is in an interrupt context such as handling a page fault) it would fall back to using native_set_pte() and trapping and emulating the PTE write. On 32-bit guests this requires two traps for each PTE write (one for each dword of the PTE). Instead, do one mmu_update hypercall directly. During construction of the initial page tables, continue to use native_set_pte() because most of the PTEs being set are in writable and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and using a hypercall for this is very expensive. This significantly improves page fault performance in 32-bit PV guests. lmbench3 test Before After Improvement ---------------------------------------------- lat_pagefault 3.18 us 2.32 us 27% lat_proc fork 356 us 313.3 us 11% Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mce: Register native mce handler as vMCE bounce back pointLiu, Jinsong
When Xen hypervisor inject vMCE to guest, use native mce handler to handle it Signed-off-by: Ke, Liping <liping.ke@intel.com> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mce: Add mcelog support for Xen platformLiu, Jinsong
When MCA error occurs, it would be handled by Xen hypervisor first, and then the error information would be sent to initial domain for logging. This patch gets error information from Xen hypervisor and convert Xen format error into Linux format mcelog. This logic is basically self-contained, not touching other kernel components. By using tools like mcelog tool users could read specific error information, like what they did under native Linux. To test follow directions outlined in Documentation/acpi/apei/einj.txt Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Ke, Liping <liping.ke@intel.com> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-06-15Merge tag 'stable/for-linus-3.5-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull five Xen bug-fixes from Konrad Rzeszutek Wilk: - When booting as PVHVM we would try to use PV console - but would not validate the parameters causing us to crash during restore b/c we re-use the wrong event channel. - When booting on machines with SR-IOV PCI bridge we didn't check for the bridge and tried to use it. - Under AMD machines would advertise the APERFMPERF resulting in needless amount of MSRs from the guest. - A global value (xen_released_pages) was not subtracted at bootup when pages were added back in. This resulted in the balloon worker having the wrong account of how many pages were truly released. - Fix dead-lock when xen-blkfront is run in the same domain as xen-blkback. * tag 'stable/for-linus-3.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: mark local pages as FOREIGN in the m2p_override xen/setup: filter APERFMPERF cpuid feature out xen/balloon: Subtract from xen_released_pages the count that is populated. xen/pci: Check for PCI bridge before using it. xen/events: Add WARN_ON when quick lookup found invalid type. xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness. xen/hvc: Fix error cases around HVM_PARAM_CONSOLE_PFN xen/hvc: Collapse error logic.
2012-06-14xen: mark local pages as FOREIGN in the m2p_overrideStefano Stabellini
When the frontend and the backend reside on the same domain, even if we add pages to the m2p_override, these pages will never be returned by mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will always fail, so the pfn of the frontend will be returned instead (resulting in a deadlock because the frontend pages are already locked). INFO: task qemu-system-i38:1085 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-system-i38 D ffff8800cfc137c0 0 1085 1 0x00000000 ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0 ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0 ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0 Call Trace: [<ffffffff81101ee0>] ? __lock_page+0x70/0x70 [<ffffffff81a0fdd9>] schedule+0x29/0x70 [<ffffffff81a0fe80>] io_schedule+0x60/0x80 [<ffffffff81101eee>] sleep_on_page+0xe/0x20 [<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff81101ed7>] __lock_page+0x67/0x70 [<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811867e6>] ? bio_add_page+0x36/0x40 [<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60 [<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70 [<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00 [<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00 [<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00 [<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0 [<ffffffff81104027>] generic_file_aio_read+0x737/0x780 [<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0 [<ffffffff811038f0>] ? find_get_pages+0x150/0x150 [<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0 [<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90 [<ffffffff81198856>] aio_run_iocb+0x66/0x1a0 [<ffffffff811998b8>] do_io_submit+0x708/0xb90 [<ffffffff81199d50>] sys_io_submit+0x10/0x20 [<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b The explanation is in the comment within the code: We need to do this because the pages shared by the frontend (xen-blkfront) can be already locked (lock_page, called by do_read_cache_page); when the userspace backend tries to use them with direct_IO, mfn_to_pfn returns the pfn of the frontend, so do_blockdev_direct_IO is going to try to lock the same pages again resulting in a deadlock. A simplified call graph looks like this: pygrub QEMU ----------------------------------------------- do_read_cache_page io_submit | | lock_page ext3_direct_IO | bio_add_page | lock_page Internally the xen-blkback uses m2p_add_override to swizzle (temporarily) a 'struct page' to have a different MFN (so that it can point to another guest). It also can easily find out whether another pfn corresponding to the mfn exists in the m2p, and can set the FOREIGN bit in the p2m, making sure that mfn_to_pfn returns the pfn of the backend. This allows the backend to perform direct_IO on these pages, but as a side effect prevents the frontend from using get_user_pages_fast on them while they are being shared with the backend. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-31xen/setup: filter APERFMPERF cpuid feature outAndre Przywara
Xen PV kernels allow access to the APERF/MPERF registers to read the effective frequency. Access to the MSRs is however redirected to the currently scheduled physical CPU, making consecutive read and compares unreliable. In addition each rdmsr traps into the hypervisor. So to avoid bogus readouts and expensive traps, disable the kernel internal feature flag for APERF/MPERF if running under Xen. This will a) remove the aperfmperf flag from /proc/cpuinfo b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to use the feature to improve scheduling (by default disabled) c) not mislead the cpufreq driver to use the MSRs This does not cover userland programs which access the MSRs via the device file interface, but this will be addressed separately. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-30x86, amd, xen: Avoid NULL pointer paravirt referencesKonrad Rzeszutek Wilk
Stub out MSR methods that aren't actually needed. This fixes a crash as Xen Dom0 on AMD Trinity systems. A bigger patch should be added to remove the paravirt machinery completely for the methods which apparently have no users! Reported-by: Andre Przywara <andre.przywara@amd.com> Link: http://lkml.kernel.org/r/20120530222356.GA28417@andromeda.dapyr.net Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org>
2012-05-30xen/balloon: Subtract from xen_released_pages the count that is populated.Konrad Rzeszutek Wilk
We did not take into account that xen_released_pages would be used outside the initial E820 parsing code. As such we would did not subtract from xen_released_pages the count of pages that we had populated back (instead we just did a simple extra_pages = released - populated). The balloon driver uses xen_released_pages to set the initial current_pages count. If this is wrong (too low) then when a new (higher) target is set, the balloon driver will request too many pages from Xen." This fixes errors such as: (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (51 of 512) during bootup and free_memory : 0 where the free_memory should be 128. Acked-by: David Vrabel <david.vrabel@citrix.com> [v1: Per David's review made the git commit better] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-24Merge tag 'stable/for-linus-3.5-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen updates from Konrad Rzeszutek Wilk: "Features: * Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly. * Fix self-ballooning code, and balloon logic when booting as initial domain. * Move array printing code to generic debugfs * Support XenBus domains. * Lazily free grants when a domain is dead/non-existent. * In M2P code use batching calls Bug-fixes: * Fix NULL dereference in allocation failure path (hvc_xen) * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug * Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one." Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of apic ipi interface next to the new apic_id functions. * tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: do not map the same GSI twice in PVHVM guests. hvc_xen: NULL dereference on allocation failure xen: Add selfballoning memory reservation tunable. xenbus: Add support for xenbus backend in stub domain xen/smp: unbind irqworkX when unplugging vCPUs. xen: enter/exit lazy_mmu_mode around m2p_override calls xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep xen: implement IRQ_WORK_VECTOR handler xen: implement apic ipi interface xen/setup: update VA mapping when releasing memory during setup xen/setup: Combine the two hypercall functions - since they are quite similar. xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 xen/gnttab: add deferred freeing logic debugfs: Add support to print u32 array in debugfs xen/p2m: An early bootup variant of set_phys_to_machine xen/p2m: Collapse early_alloc_p2m_middle redundant checks. xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument xen/p2m: Move code around to allow for better re-usage.
2012-05-23Merge branch 'x86-extable-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull exception table generation updates from Ingo Molnar: "The biggest change here is to allow the build-time sorting of the exception table, to speed up booting. This is achieved by the architecture enabling BUILDTIME_EXTABLE_SORT. This option is enabled for x86 and MIPS currently. On x86 a number of fixes and changes were needed to allow build-time sorting of the exception table, in particular a relocation invariant exception table format was needed. This required the abstracting out of exception table protocol and the removal of 20 years of accumulated assumptions about the x86 exception table format. While at it, this tree also cleans up various other aspects of exception handling, such as early(er) exception handling for rdmsr_safe() et al. All in one, as the result of these changes the x86 exception code is now pretty nice and modern. As an added bonus any regressions in this code will be early and violent crashes, so if you see any of those, you'll know whom to blame!" Fix up trivial conflicts in arch/{mips,x86}/Kconfig files due to nearby modifications of other core architecture options. * 'x86-extable-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) Revert "x86, extable: Disable presorted exception table for now" scripts/sortextable: Handle relative entries, and other cleanups x86, extable: Switch to relative exception table entries x86, extable: Disable presorted exception table for now x86, extable: Add _ASM_EXTABLE_EX() macro x86, extable: Remove open-coded exception table entries in arch/x86/ia32/ia32entry.S x86, extable: Remove open-coded exception table entries in arch/x86/include/asm/xsave.h x86, extable: Remove open-coded exception table entries in arch/x86/include/asm/kvm_host.h x86, extable: Remove the now-unused __ASM_EX_SEC macros x86, extable: Remove open-coded exception table entries in arch/x86/xen/xen-asm_32.S x86, extable: Remove open-coded exception table entries in arch/x86/um/checksum_32.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/usercopy_32.c x86, extable: Remove open-coded exception table entries in arch/x86/lib/putuser.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/getuser.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/csum-copy_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/copy_user_nocache_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/copy_user_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/checksum_32.S x86, extable: Remove open-coded exception table entries in arch/x86/kernel/test_rodata.c x86, extable: Remove open-coded exception table entries in arch/x86/kernel/entry_64.S ...
2012-05-22Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic changes from Ingo Molnar: "Most of the changes are about helping virtualized guest kernels achieve better performance." Fix up trivial conflicts with the iommu updates to arch/x86/kernel/apic/io_apic.c * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Implement EIO micro-optimization x86/apic: Add apic->eoi_write() callback x86/apic: Use symbolic APIC_EOI_ACK x86/apic: Fix typo EIO_ACK -> EOI_ACK and document it x86/xen/apic: Add missing #include <xen/xen.h> x86/apic: Only compile local function if used with !CONFIG_GENERIC_PENDING_IRQ x86/apic: Fix UP boot crash x86: Conditionally update time when ack-ing pending irqs xen/apic: implement io apic read with hypercall Revert "xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'" xen/x86: Implement x86_apic_ops x86/apic: Replace io_apic_ops with x86_io_apic_ops.
2012-05-21Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug cleanups from Thomas Gleixner: "This series is merily a cleanup of code copied around in arch/* and not changing any of the real cpu hotplug horrors yet. I wish I'd had something more substantial for 3.5, but I underestimated the lurking horror..." Fix up trivial conflicts in arch/{arm,sparc,x86}/Kconfig and arch/sparc/include/asm/thread_info_32.h * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits) um: Remove leftover declaration of alloc_task_struct_node() task_allocator: Use config switches instead of magic defines sparc: Use common threadinfo allocator score: Use common threadinfo allocator sh-use-common-threadinfo-allocator mn10300: Use common threadinfo allocator powerpc: Use common threadinfo allocator mips: Use common threadinfo allocator hexagon: Use common threadinfo allocator m32r: Use common threadinfo allocator frv: Use common threadinfo allocator cris: Use common threadinfo allocator x86: Use common threadinfo allocator c6x: Use common threadinfo allocator fork: Provide kmemcache based thread_info allocator tile: Use common threadinfo allocator fork: Provide weak arch_release_[task_struct|thread_info] functions fork: Move thread info gfp flags to header fork: Remove the weak insanity sh: Remove cpu_idle_wait() ...
2012-05-21xen/smp: unbind irqworkX when unplugging vCPUs.Konrad Rzeszutek Wilk
The git commit 1ff2b0c303698e486f1e0886b4d9876200ef8ca5 "xen: implement IRQ_WORK_VECTOR handler" added the functionality to have a per-cpu "irqworkX" for the IPI APIC functionality. However it missed the unbind when a vCPU is unplugged resulting in an orphaned per-cpu interrupt line for unplugged vCPU: 30: 216 0 xen-dyn-event hvc_console 31: 810 4 xen-dyn-event eth0 32: 29 0 xen-dyn-event blkif - 36: 0 0 xen-percpu-ipi irqwork2 - 37: 287 0 xen-dyn-event xenbus + 36: 287 0 xen-dyn-event xenbus NMI: 0 0 Non-maskable interrupts LOC: 0 0 Local timer interrupts SPU: 0 0 Spurious interrupts Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-18x86/xen/apic: Add missing #include <xen/xen.h>Ingo Molnar
This file depends on <xen/xen.h>, but the dependency was hidden due to: <asm/acpi.h> -> <asm/trampoline.h> -> <asm/io.h> -> <xen/xen.h> With the removal of <asm/trampoline.h>, this exposed the missing Cc: Len Brown <lenb@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-7ccybvue6mw6wje3uxzzcglj@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-08Merge branch 'smp/threadalloc' into smp/hotplugThomas Gleixner
Reason: Pull in the separate branch which was created so arch/tile can base further work on it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-07Merge branch 'stable/autoballoon.v5.2' into stable/for-linus-3.5Konrad Rzeszutek Wilk
* stable/autoballoon.v5.2: xen/setup: update VA mapping when releasing memory during setup xen/setup: Combine the two hypercall functions - since they are quite similar. xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 xen/p2m: An early bootup variant of set_phys_to_machine xen/p2m: Collapse early_alloc_p2m_middle redundant checks. xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument xen/p2m: Move code around to allow for better re-usage.
2012-05-07xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleepKonrad Rzeszutek Wilk
Provide the registration callback to call in the Xen's ACPI sleep functionality. This means that during S3/S5 we make a hypercall XENPF_enter_acpi_sleep with the proper PM1A/PM1B registers. Based of Ke Yu's <ke.yu@intel.com> initial idea. [ From http://xenbits.xensource.com/linux-2.6.18-xen.hg change c68699484a65 ] [v1: Added Copyright and license] [v2: Added check if PM1A/B the 16-bits MSB contain something. The spec only uses 16-bits but might have more in future] Signed-off-by: Liang Tang <liang.tang@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07xen: implement IRQ_WORK_VECTOR handlerLin Ming
Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07xen: implement apic ipi interfaceBen Guthro
Map native ipi vector to xen vector. Implement apic ipi interface with xen_send_IPI_one. Tested-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Ben Guthro <ben@guthro.net> Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07xen/setup: update VA mapping when releasing memory during setupDavid Vrabel
In xen_memory_setup(), if a page that is being released has a VA mapping this must also be updated. Otherwise, the page will be not released completely -- it will still be referenced in Xen and won't be freed util the mapping is removed and this prevents it from being reallocated at a different PFN. This was already being done for the ISA memory region in xen_ident_map_ISA() but on many systems this was omitting a few pages as many systems marked a few pages below the ISA memory region as reserved in the e820 map. This fixes errors such as: (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17) Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07xen/setup: Combine the two hypercall functions - since they are quite similar.Konrad Rzeszutek Wilk
They use the same set of arguments, so it is just the matter of using the proper hypercall. Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAMKonrad Rzeszutek Wilk
When the Xen hypervisor boots a PV kernel it hands it two pieces of information: nr_pages and a made up E820 entry. The nr_pages value defines the range from zero to nr_pages of PFNs which have a valid Machine Frame Number (MFN) underneath it. The E820 mirrors that (with the VGA hole): BIOS-provided physical RAM map: Xen: 0000000000000000 - 00000000000a0000 (usable) Xen: 00000000000a0000 - 0000000000100000 (reserved) Xen: 0000000000100000 - 0000000080800000 (usable) The fun comes when a PV guest that is run with a machine E820 - that can either be the initial domain or a PCI PV guest, where the E820 looks like the normal thing: BIOS-provided physical RAM map: Xen: 0000000000000000 - 000000000009e000 (usable) Xen: 000000000009ec00 - 0000000000100000 (reserved) Xen: 0000000000100000 - 0000000020000000 (usable) Xen: 0000000020000000 - 0000000020200000 (reserved) Xen: 0000000020200000 - 0000000040000000 (usable) Xen: 0000000040000000 - 0000000040200000 (reserved) Xen: 0000000040200000 - 00000000bad80000 (usable) Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS) .. With that overlaying the nr_pages directly on the E820 does not work as there are gaps and non-RAM regions that won't be used by the memory allocator. The 'xen_release_chunk' helps with that by punching holes in the P2M (PFN to MFN lookup tree) for those regions and tells us that: Freeing 20000-20200 pfn range: 512 pages freed Freeing 40000-40200 pfn range: 512 pages freed Freeing bad80-badf4 pfn range: 116 pages freed Freeing badf6-bae7f pfn range: 137 pages freed Freeing bb000-100000 pfn range: 282624 pages freed Released 283999 pages of unused memory Those 283999 pages are subtracted from the nr_pages and are returned to the hypervisor. The end result is that the initial domain boots with 1GB less memory as the nr_pages has been subtracted by the amount of pages residing within the PCI hole. It can balloon up to that if desired using 'xl mem-set 0 8092', but the balloon driver is not always compiled in for the initial domain. This patch, implements the populate hypercall (XENMEM_populate_physmap) which increases the the domain with the same amount of pages that were released. The other solution (that did not work) was to transplant the MFN in the P2M tree - the ones that were going to be freed were put in the E820_RAM regions past the nr_pages. But the modifications to the M2P array (the other side of creating PTEs) were not carried away. As the hypervisor is the only one capable of modifying that and the only two hypercalls that would do this are: the update_va_mapping (which won't work, as during initial bootup only PFNs up to nr_pages are mapped in the guest) or via the populate hypercall. The end result is that the kernel can now boot with the nr_pages without having to subtract the 283999 pages. On a 8GB machine, with various dom0_mem= parameters this is what we get: no dom0_mem -Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init) +Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init) dom0_mem=3G -Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init) +Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init) dom0_mem=max:3G -Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init) +Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init) And the 'xm list' or 'xl list' now reflect what the dom0_mem= argument is. Acked-by: David Vrabel <david.vrabel@citrix.com> [v2: Use populate hypercall] [v3: Remove debug printks] [v4: Simplify code] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0Konrad Rzeszutek Wilk
Otherwise we can get these meaningless: Freeing bad80-badf4 pfn range: 0 pages freed We also can do this for the summary ones - no point of printing "Set 0 page(s) to 1-1 mapping" Acked-by: David Vrabel <david.vrabel@citrix.com> [v1: Extended to the summary printks] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07xen/pci: don't use PCI BIOS service for configuration space accessesDavid Vrabel
The accessing PCI configuration space with the PCI BIOS32 service does not work in PV guests. On systems without MMCONFIG or where the BIOS hasn't marked the MMCONFIG region as reserved in the e820 map, the BIOS service is probed (even though direct access is preferred) and this hangs. CC: stable@kernel.org Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> [v1: Fixed compile error when CONFIG_PCI is not set] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEsKonrad Rzeszutek Wilk
If I try to do "cat /sys/kernel/debug/kernel_page_tables" I end up with: BUG: unable to handle kernel paging request at ffffc7fffffff000 IP: [<ffffffff8106aa51>] ptdump_show+0x221/0x480 PGD 0 Oops: 0000 [#1] SMP CPU 0 .. snip.. RAX: 0000000000000000 RBX: ffffc00000000fff RCX: 0000000000000000 RDX: 0000800000000000 RSI: 0000000000000000 RDI: ffffc7fffffff000 which is due to the fact we are trying to access a PFN that is not accessible to us. The reason (at least in this case) was that PGD[256] is set to __HYPERVISOR_VIRT_START which was setup (by the hypervisor) to point to a read-only linear map of the MFN->PFN array. During our parsing we would get the MFN (a valid one), try to look it up in the MFN->PFN tree and find it invalid and return ~0 as PFN. Then pte_mfn_to_pfn would happilly feed that in, attach the flags and return it back to the caller. 'ptdump_show' bitshifts it and gets and invalid value that it tries to dereference. Instead of doing all of that, we detect the ~0 case and just return !_PAGE_PRESENT. This bug has been in existence .. at least until 2.6.37 (yikes!) CC: stable@kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-07xen/apic: Return the APIC ID (and version) for CPU 0.Konrad Rzeszutek Wilk
On x86_64 on AMD machines where the first APIC_ID is not zero, we get: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled) BIOS bug: APIC version is 0 for CPU 1/0x10, fixing up to 0x10 BIOS bug: APIC version mismatch, boot CPU: 0, CPU 1: version 10 which means that when the ACPI processor driver loads and tries to parse the _Pxx states it fails to do as, as it ends up calling acpi_get_cpuid which does this: for_each_possible_cpu(i) { if (cpu_physical_id(i) == apic_id) return i; } And the bootup CPU, has not been found so it fails and returns -1 for the first CPU - which then subsequently in the loop that "acpi_processor_get_info" does results in returning an error, which means that "acpi_processor_add" failing and per_cpu(processor) is never set (and is NULL). That means that when xen-acpi-processor tries to load (much much later on) and parse the P-states it gets -ENODEV from acpi_processor_register_performance() (which tries to read the per_cpu(processor)) and fails to parse the data. Reported-by-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@amd.com> [v2: Bit-shift APIC ID by 24 bits] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-01xen/apic: implement io apic read with hypercallLin Ming
Implements xen_io_apic_read with hypercall, so it returns proper IO-APIC information instead of fabricated one. Fallback to return an emulated IO_APIC values if hypercall fails. [v2: fallback to return an emulated IO_APIC values if hypercall fails] Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-01Revert "xen/x86: Workaround 'x86/ioapic: Add register level checks to detect ↵Konrad Rzeszutek Wilk
bogus io-apic entries'" This reverts commit 2531d64b6fe2724dc432b67d8dc66bd45621da0b. The two patches: x86/apic: Replace io_apic_ops with x86_io_apic_ops. xen/x86: Implement x86_apic_ops take care of fixing it properly. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-05-01xen/x86: Implement x86_apic_opsKonrad Rzeszutek Wilk
Or rather just implement one different function as opposed to the native one : the read function. We synthesize the values. Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> [v1: Rebased on top of tip/x86/urgent] [v2: Return 0xfd instead of 0xff in the default case] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-27xen: correctly check for pending events when restoring irq flagsDavid Vrabel
In xen_restore_fl_direct(), xen_force_evtchn_callback() was being called even if no events were pending. This resulted in (depending on workload) about a 100 times as many xen_version hypercalls as necessary. Fix this by correcting the sense of the conditional jump. This seems to give a significant performance benefit for some workloads. There is some subtle tricksy "..since the check here is trying to check both pending and masked in a single cmpw, but I think this is correct. It will call check_events now only when the combined mask+pending word is 0x0001 (aka unmasked, pending)." (Ian) CC: stable@kernel.org Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-26xen/smp: Fix crash when booting with ACPI hotplug CPUs.Konrad Rzeszutek Wilk
When we boot on a machine that can hotplug CPUs and we are using 'dom0_max_vcpus=X' on the Xen hypervisor line to clip the amount of CPUs available to the initial domain, we get this: (XEN) Command line: com1=115200,8n1 dom0_mem=8G noreboot dom0_max_vcpus=8 sync_console mce_verbosity=verbose console=com1,vga loglvl=all guest_loglvl=all .. snip.. DMI: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x032.072520111118 07/25/2011 .. snip. SMP: Allowing 64 CPUs, 32 hotplug CPUs installing Xen timer for CPU 7 cpu 7 spinlock event irq 361 NMI watchdog: disabled (cpu7): hardware events not enabled Brought up 8 CPUs .. snip.. [acpi processor finds the CPUs are not initialized and starts calling arch_register_cpu, which creates /sys/devices/system/cpu/cpu8/online] CPU 8 got hotplugged CPU 9 got hotplugged CPU 10 got hotplugged .. snip.. initcall 1_acpi_battery_init_async+0x0/0x1b returned 0 after 406 usecs calling erst_init+0x0/0x2bb @ 1 [and the scheduler sticks newly started tasks on the new CPUs, but said CPUs cannot be initialized b/c the hypervisor has limited the amount of vCPUS to 8 - as per the dom0_max_vcpus=8 flag. The spinlock tries to kick the other CPU, but the structure for that is not initialized and we crash.] BUG: unable to handle kernel paging request at fffffffffffffed8 IP: [<ffffffff81035289>] xen_spin_lock+0x29/0x60 PGD 180d067 PUD 180e067 PMD 0 Oops: 0002 [#1] SMP CPU 7 Modules linked in: Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc2upstream-00001-gf5154e8 #1 Intel Corporation S2600CP/S2600CP RIP: e030:[<ffffffff81035289>] [<ffffffff81035289>] xen_spin_lock+0x29/0x60 RSP: e02b:ffff8801fb9b3a70 EFLAGS: 00010282 With this patch, we cap the amount of vCPUS that the initial domain can run, to exactly what dom0_max_vcpus=X has specified. In the future, if there is a hypercall that will allow a running domain to expand past its initial set of vCPUS, this patch should be re-evaluated. CC: stable@kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-26xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded.Konrad Rzeszutek Wilk
There are exactly four users of __monitor and __mwait: - cstate.c (which allows acpi_processor_ffh_cstate_enter to be called when the cpuidle API drivers are used. However patch "cpuidle: replace xen access to x86 pm_idle and default_idle" provides a mechanism to disable the cpuidle and use safe_halt. - smpboot (which allows mwait_play_dead to be called). However safe_halt is always used so we skip that. - intel_idle (same deal as above). - acpi_pad.c. This the one that we do not want to run as we will hit the below crash. Why do we want to expose MWAIT_LEAF in the first place? We want it for the xen-acpi-processor driver - which uploads C-states to the hypervisor. If MWAIT_LEAF is set, the cstate.c sets the proper address in the C-states so that the hypervisor can benefit from using the MWAIT functionality. And that is the sole reason for using it. Without this patch, if a module performs mwait or monitor we get this: invalid opcode: 0000 [#1] SMP CPU 2 .. snip.. Pid: 5036, comm: insmod Tainted: G O 3.4.0-rc2upstream-dirty #2 Intel Corporation S2600CP/S2600CP RIP: e030:[<ffffffffa000a017>] [<ffffffffa000a017>] mwait_check_init+0x17/0x1000 [mwait_check] RSP: e02b:ffff8801c298bf18 EFLAGS: 00010282 RAX: ffff8801c298a010 RBX: ffffffffa03b2000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8801c29800d8 RDI: ffff8801ff097200 RBP: ffff8801c298bf18 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffffffffa000a000 R14: 0000005148db7294 R15: 0000000000000003 FS: 00007fbb364f2700(0000) GS:ffff8801ff08c000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000179f038 CR3: 00000001c9469000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process insmod (pid: 5036, threadinfo ffff8801c298a000, task ffff8801c29cd7e0) Stack: ffff8801c298bf48 ffffffff81002124 ffffffffa03b2000 00000000000081fd 000000000178f010 000000000178f030 ffff8801c298bf78 ffffffff810c41e6 00007fff3fb30db9 00007fff3fb30db9 00000000000081fd 0000000000010000 Call Trace: [<ffffffff81002124>] do_one_initcall+0x124/0x170 [<ffffffff810c41e6>] sys_init_module+0xc6/0x220 [<ffffffff815b15b9>] system_call_fastpath+0x16/0x1b Code: <0f> 01 c8 31 c0 0f 01 c9 c9 c3 00 00 00 00 00 00 00 00 00 00 00 00 RIP [<ffffffffa000a017>] mwait_check_init+0x17/0x1000 [mwait_check] RSP <ffff8801c298bf18> ---[ end trace 16582fc8a3d1e29a ]--- Kernel panic - not syncing: Fatal exception With this module (which is what acpi_pad.c would hit): MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>"); MODULE_DESCRIPTION("mwait_check_and_back"); MODULE_LICENSE("GPL"); MODULE_VERSION(); static int __init mwait_check_init(void) { __monitor((void *)&current_thread_info()->flags, 0, 0); __mwait(0, 0); return 0; } static void __exit mwait_check_exit(void) { } module_init(mwait_check_init); module_exit(mwait_check_exit); Reported-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-26x86: Use generic idle thread allocationThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20120420124557.246929343@linutronix.de
2012-04-26x86: Add task_struct argument to smp_ops.cpu_upThomas Gleixner
Preparatory patch to use the generic idle thread allocation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20120420124557.176604405@linutronix.de
2012-04-20x86, extable: Remove open-coded exception table entries in ↵H. Peter Anvin
arch/x86/xen/xen-asm_32.S Remove open-coded exception table entries in arch/x86/xen/xen-asm_32.S, and replace them with _ASM_EXTABLE() macros; this will allow us to change the format and type of the exception table entries. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: David Daney <david.daney@cavium.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/CA%2B55aFyijf43qSu3N9nWHEBwaGbb7T2Oq9A=9EyR=Jtyqfq_cQ@mail.gmail.com
2012-04-20Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe"Konrad Rzeszutek Wilk
This reverts commit b960d6c43a63ebd2d8518b328da3816b833ee8cc. If we have another thread (very likely) touched the list, we end up hitting a problem "that the next element is wrong because we should be able to cope with that. The problem is that the next->next pointer would be set LIST_POISON1. " (Stefano's comment on the patch). Reverting for now. Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-18Merge commit 'c104f1fa1ecf4ee0fc06e31b1f77630b2551be81' into ↵Konrad Rzeszutek Wilk
stable/for-linus-3.4 * commit 'c104f1fa1ecf4ee0fc06e31b1f77630b2551be81': (14566 commits) cpufreq: OMAP: fix build errors: depends on ARCH_OMAP2PLUS sparc64: Eliminate obsolete __handle_softirq() function sparc64: Fix bootup crash on sun4v. kconfig: delete last traces of __enabled_ from autoconf.h Revert "kconfig: fix __enabled_ macros definition for invisible and un-selected symbols" kconfig: fix IS_ENABLED to not require all options to be defined irq_domain: fix type mismatch in debugfs output format staging: android: fix mem leaks in __persistent_ram_init() staging: vt6656: Don't leak memory in drivers/staging/vt6656/ioctl.c::private_ioctl() staging: iio: hmc5843: Fix crash in probe function. panic: fix stack dump print on direct call to panic() drivers/rtc/rtc-pl031.c: enable clock on all ST variants Revert "mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone()" hugetlb: fix race condition in hugetlb_fault() drivers/rtc/rtc-twl.c: use static register while reading time drivers/rtc/rtc-s3c.c: add placeholder for driver private data drivers/rtc/rtc-s3c.c: fix compilation error MAINTAINERS: add PCDP console maintainer memcg: do not open code accesses to res_counter members drivers/rtc/rtc-efi.c: fix section mismatch warning ...
2012-04-17xen/p2m: m2p_find_override: use list_for_each_entry_safeStefano Stabellini
Use list_for_each_entry_safe and remove the spin_lock acquisition in m2p_find_override. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-17debugfs: Add support to print u32 array in debugfsSrivatsa Vaddagiri
Move the code from Xen to debugfs to make the code common for other users as well. Accked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com> [v1: Fixed rebase issues] [v2: Fixed PPC compile issues] Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06Merge tag 'stable/for-linus-3.4-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: "Two fixes for regressions: * one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree, * the other is to fix drivers to load on PV (a previous patch made them only load in PVonHVM mode). The rest are just minor fixes in the various drivers and some cleanup in the core code." * tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success xen/pciback: fix XEN_PCI_OP_enable_msix result xen/smp: Remove unnecessary call to smp_processor_id() xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' xen: only check xen_platform_pci_unplug if hvm
2012-04-06xen/p2m: An early bootup variant of set_phys_to_machineKonrad Rzeszutek Wilk
During early bootup we can't use alloc_page, so to allocate leaf pages in the P2M we need to use extend_brk. For that we are utilizing the early_alloc_p2m and early_alloc_p2m_middle functions to do the job for us. This function follows the same logic as set_phys_to_machine. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06xen/p2m: Collapse early_alloc_p2m_middle redundant checks.Konrad Rzeszutek Wilk
At the start of the function we were checking for idx != 0 and bailing out. And later calling extend_brk if idx != 0. That is unnecessary so remove that checks. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argumentKonrad Rzeszutek Wilk
For identity cases we want to call reserve_brk only on the boundary conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is to work around identify regions (PCI spaces, gaps in E820) which are not aligned on 2MB regions. However for the case were we want to allocate P2M middle leafs at the early bootup stage, irregardless of this alignment check we need some means of doing that. For that we provide the new argument. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06xen/p2m: Move code around to allow for better re-usage.Konrad Rzeszutek Wilk
We are going to be using the early_alloc_p2m (and early_alloc_p2m_middle) code in follow up patches which are not related to setting identity pages. Hence lets move the code out in its own function and rename them as appropiate. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06xen/smp: Remove unnecessary call to smp_processor_id()Srivatsa S. Bhat
There is an extra and unnecessary call to smp_processor_id() in cpu_bringup(). Remove it. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus ↵Konrad Rzeszutek Wilk
io-apic entries' The above mentioned patch checks the IOAPIC and if it contains -1, then it unmaps said IOAPIC. But under Xen we get this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0 PGD 0 Oops: 0002 [#1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron 1525 /0U990C RIP: e030:[<ffffffff8134e51f>] [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0 RSP: e02b: ffff8800d42cbb70 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001 RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001 RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010 FS: 0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0:000000008005003b CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000) Stack: 00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157 ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384 0000000000000002 0000000000000010 00000000ffffffff 0000000000000000 Call Trace: [<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230 [<ffffffff8100a9b2>] ? check_events+0x12+0x20 [<ffffffff814bab42>] xen_register_pirq+0x82/0xe0 [<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0 [<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30 [<ffffffff8103036f>] acpi_register_gsi+0xf/0x20 [<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202 [<ffffffff814bc849>] pcibios_enable_device+0x39/0x40 [<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70 [<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0 [<ffffffff812dc8d3>] pci_enable_device+0x13/0x20 The reason we are dying is b/c the call acpi_get_override_irq() is used, which returns the polarity and trigger for the IRQs. That function calls mp_find_ioapics to get the 'struct ioapic' structure - which along with the mp_irq[x] is used to figure out the default values and the polarity/trigger overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt. The proper fix for this is going in v3.5 and adds an x86_io_apic_ops struct so that platforms can override it. But for v3.4 lets carry this work-around. This patch does that by providing a slightly different variant of the fake IOAPIC entries. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull DMA mapping branch from Marek Szyprowski: "Short summary for the whole series: A few limitations have been identified in the current dma-mapping design and its implementations for various architectures. There exist more than one function for allocating and freeing the buffers: currently these 3 are used dma_{alloc, free}_coherent, dma_{alloc,free}_writecombine, dma_{alloc,free}_noncoherent. For most of the systems these calls are almost equivalent and can be interchanged. For others, especially the truly non-coherent ones (like ARM), the difference can be easily noticed in overall driver performance. Sadly not all architectures provide implementations for all of them, so the drivers might need to be adapted and cannot be easily shared between different architectures. The provided patches unify all these functions and hide the differences under the already existing dma attributes concept. The thread with more references is available here: http://www.spinics.net/lists/linux-sh/msg09777.html These patches are also a prerequisite for unifying DMA-mapping implementation on ARM architecture with the common one provided by dma_map_ops structure and extending it with IOMMU support. More information is available in the following thread: http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819 More works on dma-mapping framework are planned, especially in the area of buffer sharing and managing the shared mappings (together with the recently introduced dma_buf interface: commit d15bd7ee445d "dma-buf: Introduce dma buffer sharing mechanism"). The patches in the current set introduce a new alloc/free methods (with support for memory attributes) in dma_map_ops structure, which will later replace dma_alloc_coherent and dma_alloc_writecombine functions." People finally started piping up with support for merging this, so I'm merging it as the last of the pending stuff from the merge window. Looks like pohmelfs is going to wait for 3.5 and more external support for merging. * 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: common: DMA-mapping: add NON-CONSISTENT attribute common: DMA-mapping: add WRITE_COMBINE attribute common: dma-mapping: introduce mmap method common: dma-mapping: remove old alloc_coherent and free_coherent methods Hexagon: adapt for dma_map_ops changes Unicore32: adapt for dma_map_ops changes Microblaze: adapt for dma_map_ops changes SH: adapt for dma_map_ops changes Alpha: adapt for dma_map_ops changes SPARC: adapt for dma_map_ops changes PowerPC: adapt for dma_map_ops changes MIPS: adapt for dma_map_ops changes X86 & IA64: adapt for dma_map_ops changes common: dma-mapping: introduce generic alloc() and free() methods