summaryrefslogtreecommitdiff
path: root/arch/sparc/mm
AgeCommit message (Collapse)Author
2014-10-05sparc64: Define VA hole at run time, rather than at compile time.David S. Miller
Now that we use 4-level page tables, we can provide up to 53-bits of virtual address space to the user. Adjust the VA hole based upon the capabilities of the cpu type probed. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-05sparc64: Switch to 4-level page tables.David S. Miller
This has become necessary with chips that support more than 43-bits of physical addressing. Based almost entirely upon a patch by Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-04sparc64: Fix reversed start/end in flush_tlb_kernel_range()David S. Miller
When we have to split up a flush request into multiple pieces (in order to avoid the firmware range) we don't specify the arguments in the right order for the second piece. Fix the order, or else we get hangs as the code tries to flush "a lot" of entries and we get lockups like this: [ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032] [ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core [ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608 [ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000 [ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000 Not tainted [ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40> [ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000 [ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006 [ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000 [ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54 [ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0> [ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000 [ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138 [ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000 [ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24 [ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80> [ 4423.234193] Call Trace: [ 4423.239051] [0000000000530c24] free_vmap_area_noflush+0x64/0x80 [ 4423.251029] [0000000000531a7c] remove_vm_area+0x5c/0x80 [ 4423.261628] [0000000000531b80] __vunmap+0x20/0x120 [ 4423.271352] [000000000071cf18] n_tty_close+0x18/0x40 [ 4423.281423] [00000000007222b0] tty_ldisc_close+0x30/0x60 [ 4423.292183] [00000000007225a4] tty_ldisc_reinit+0x24/0xa0 [ 4423.303120] [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0 [ 4423.314232] [0000000000719aa0] __tty_hangup+0x280/0x3c0 [ 4423.324835] [0000000000724cb4] pty_close+0x134/0x1a0 [ 4423.334905] [000000000071aa24] tty_release+0x104/0x500 [ 4423.345316] [00000000005511d0] __fput+0x90/0x1e0 [ 4423.354701] [000000000047fa54] task_work_run+0x94/0xe0 [ 4423.365126] [0000000000404b44] __handle_signal+0xc/0x2c Fixes: 4ca9a23765da ("sparc64: Guard against flushing openfirmware mappings.") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16sparc64: mem boot option correctionbob picco
The "mem" boot option can result in many unexpected consequences. This patch attempts to prevent boot hangs which have been experienced on T4-4 and T5-8. Basically the boot loader allocates vmlinuz and initrd higher in available OBP physical memory. For example, on a 2Tb T5-8 it isn't possible to boot with mem=20G. The patch utilizes memblock to avoid reserved regions and trim memory which is only free. Other improvements are possible for a multi-node machine. This is a snippet of the boot log with mem=20G on T5-8 with the patch applied: MEMBLOCK configuration: <- before memory reduction memory size = 0x1ffad6ce000 reserved size = 0xa1adf44 memory.cnt = 0xb memory[0x0] [0x00000030400000-0x00003fdde47fff], 0x3fada48000 bytes memory[0x1] [0x00003fdde4e000-0x00003fdde4ffff], 0x2000 bytes memory[0x2] [0x00080000000000-0x00083fffffffff], 0x4000000000 bytes memory[0x3] [0x00100000000000-0x00103fffffffff], 0x4000000000 bytes memory[0x4] [0x00180000000000-0x00183fffffffff], 0x4000000000 bytes memory[0x5] [0x00200000000000-0x00203fffffffff], 0x4000000000 bytes memory[0x6] [0x00280000000000-0x00283fffffffff], 0x4000000000 bytes memory[0x7] [0x00300000000000-0x00303fffffffff], 0x4000000000 bytes memory[0x8] [0x00380000000000-0x00383fffc71fff], 0x3fffc72000 bytes memory[0x9] [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes memory[0xa] [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes reserved[0x1] [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes ... MEMBLOCK configuration: <- after reduction of memory memory size = 0x50a1adf44 reserved size = 0xa1adf44 memory.cnt = 0x4 memory[0x0] [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes memory[0x1] [0x00380004000000-0x0038050d01d74a], 0x50901d74b bytes memory[0x2] [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes memory[0x3] [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes reserved[0x1] [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes ... Early memory node ranges node 7: [mem 0x380000000000-0x38000117dfff] node 7: [mem 0x380004000000-0x380f0d01bfff] node 7: [mem 0x383fffc92000-0x383fffca1fff] node 7: [mem 0x383fffcb4000-0x383fffcb5fff] Could not find start_pfn for node 0 Could not find start_pfn for node 1 Could not find start_pfn for node 2 Could not find start_pfn for node 3 Could not find start_pfn for node 4 Could not find start_pfn for node 5 Could not find start_pfn for node 6 . The patch was tested on T4-1, T5-8 and Jalap?no. Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16sparc64: find_node adjustmentbob picco
We have seen an issue with guest boot into LDOM that causes early boot failures because of no matching rules for node identitity of the memory. I analyzed this on my T4 and concluded there might not be a solution. I saw the issue in mainline too when booting into the control/primary domain - with guests configured. Note, this could be a firmware bug on some older machines. I'll provide a full explanation of the issues below. Should we not find a matching BEST latency group for a real address (RA) then we will assume node 0. On the T4-2 here with the information provided I can't see an alternative. Technically the LDOM shown below should match the MBLOCK to the favorable latency group. However other factors must be considered too. Were the memory controllers configured "fine" grained interleave or "coarse" grain interleaved - T4. Also should a "group" MD node be considered a NUMA node? There has to be at least one Machine Description (MD) "group" and hence one NUMA node. The group can have one or more latency groups (lg) - more than one memory controller. The current code chooses the smallest latency as the most favorable per group. The latency and lg information is in MLGROUP below. MBLOCK is the base and size of the RAs for the machine as fetched from OBP /memory "available" property. My machine has one MBLOCK but more would be possible - with holes? For a T4-2 the following information has been gathered: with LDOM guest MEMBLOCK configuration: memory size = 0x27f870000 memory.cnt = 0x3 memory[0x0] [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes memory[0x1] [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes memory[0x2] [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes reserved[0x1] [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes MBLOCK[0]: base[20000000] size[280000000] offset[0] (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values) (note: (RA + offset) & mask = val is the formula to detect a match for the memory controller. should there be no match for find_node node, a return value of -1 resulted for the node - BAD) There is one group. It has these forward links MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000] MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000] NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8]) (note: "val" is the best lg's (smallest latency) "match") no LDOM guest - bare metal MEMBLOCK configuration: memory size = 0xfdf2d0000 memory.cnt = 0x3 memory[0x0] [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes memory[0x1] [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes memory[0x2] [0x00000fff766000-0x00000fff771fff], 0xc000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00000020800000-0x00000021a04580], 0x1204581 bytes reserved[0x1] [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes MBLOCK[0]: base[20000000] size[fe0000000] offset[0] there are two groups group node[16d5] MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000] MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000] NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8]) group node[171d] MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000] MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000] NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8]) (note: for this two "group" bare metal machine, 1/2 memory is in group one's lg and 1/2 memory is in group two's lg). Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16sparc64: sun4v TLB error power off eventsbob picco
We've witnessed a few TLB events causing the machine to power off because of prom_halt. In one case it was some nfs related area during rmmod. Another was an mmapper of /dev/mem. A more recent one is an ITLB issue with a bad pagesize which could be a hardware bug. Bugs happen but we should attempt to not power off the machine and/or hang it when possible. This is a DTLB error from an mmapper of /dev/mem: [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1 SUN4V-DTLB: TPC<0xfffff80100903e6c> SUN4V-DTLB: O7[fffff801081979d0] SUN4V-DTLB: O7<0xfffff801081979d0> SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2] . This is recent mainline for ITLB: [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc> [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8] [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8> [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4] . Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call prom_halt() and drop us to OF command prompt "ok". This isn't the case for LDOMs and the machine powers off. For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal one("1"). Otherwise, for %tl > 1, we proceed eventually to die_if_kernel(). The logic of this patch was partially inspired by David Miller's feedback. Power off of large sparc64 machines is painful. Plus die_if_kernel provides more context. A reset sequence isn't a brief period on large sparc64 but better than power-off/power-on sequence. Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05sparc64: Fix up merge thinko.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcDavid S. Miller
Conflicts: arch/sparc/mm/init_64.c Conflict was simple non-overlapping additions. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04sparc64: Guard against flushing openfirmware mappings.David S. Miller
Based almost entirely upon a patch by Christopher Alexander Tobias Schulze. In commit db64fe02258f1507e13fe5212a989922323685ce ("mm: rewrite vmap layer") lazy VMAP tlb flushing was added to the vmalloc layer. This causes problems on sparc64. Sparc64 has two VMAP mapped regions and they are not contiguous with eachother. First we have the malloc mapping area, then another unrelated region, then the vmalloc region. This "another unrelated region" is where the firmware is mapped. If the lazy TLB flushing logic in the vmalloc code triggers after we've had both a module unload and a vfree or similar, it will pass an address range that goes from somewhere inside the malloc region to somewhere inside the vmalloc region, and thus covering the openfirmware area entirely. The sparc64 kernel learns about openfirmware's dynamic mappings in this region early in the boot, and then services TLB misses in this area. But openfirmware has some locked TLB entries which are not mentioned in those dynamic mappings and we should thus not disturb them. These huge lazy TLB flush ranges causes those openfirmware locked TLB entries to be removed, resulting in all kinds of problems including hard hangs and crashes during reboot/reset. Besides causing problems like this, such huge TLB flush ranges are also incredibly inefficient. A plea has been made with the author of the VMAP lazy TLB flushing code, but for now we'll put a safety guard into our flush_tlb_kernel_range() implementation. Since the implementation has become non-trivial, stop defining it as a macro and instead make it a function in a C source file. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04sparc64: Do not insert non-valid PTEs into the TSB hash table.David S. Miller
The assumption was that update_mmu_cache() (and the equivalent for PMDs) would only be called when the PTE being installed will be accessible by the user. This is not true for code paths originating from remove_migration_pte(). There are dire consequences for placing a non-valid PTE into the TSB. The TLB miss frramework assumes thatwhen a TSB entry matches we can just load it into the TLB and return from the TLB miss trap. So if a non-valid PTE is in there, we will deadlock taking the TLB miss over and over, never satisfying the miss. Just exit early from update_mmu_cache() and friends in this situation. Based upon a report and patch from Christopher Alexander Tobias Schulze. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21sparc64 - add mem to iomem resourcebob picco
This patch adds sparc RAM to /proc/iomem. It also identifies the code, data and bss regions of the kernel. Signed-off-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextLinus Torvalds
Pull sparc fixes from David Miller: "Sparc sparse fixes from Sam Ravnborg" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (67 commits) sparc64: fix sparse warnings in int_64.c sparc64: fix sparse warning in ftrace.c sparc64: fix sparse warning in kprobes.c sparc64: fix sparse warning in kgdb_64.c sparc64: fix sparse warnings in compat_audit.c sparc64: fix sparse warnings in init_64.c sparc64: fix sparse warnings in aes_glue.c sparc: fix sparse warnings in smp_32.c + smp_64.c sparc64: fix sparse warnings in perf_event.c sparc64: fix sparse warnings in kprobes.c sparc64: fix sparse warning in tsb.c sparc64: clean up compat_sigset_t.seta handling sparc64: fix sparse "Should it be static?" warnings in signal32.c sparc64: fix sparse warnings in sys_sparc32.c sparc64: fix sparse warning in pci.c sparc64: fix sparse warnings in smp_64.c sparc64: fix sparse warning in prom_64.c sparc64: fix sparse warning in btext.c sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c sparc64: fix sparse warning in process_64.c ... Conflicts: arch/sparc/include/asm/pgtable_64.h
2014-06-04hugetlb: restrict hugepage_migration_support() to x86_64Naoya Horiguchi
Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-18sparc64: fix sparse warnings in int_64.cSam Ravnborg
Fix following warnings: init_64.c:798:5: warning: symbol 'numa_cpu_lookup_table' was not declared. Should it be static? init_64.c:799:11: warning: symbol 'numa_cpumask_lookup_table' was not declared. Should it be static? The warnings were present with an allnoconfig Fix so the variables are only declared if CONFIG_NEED_MULTIPLE_NODES is defined. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18sparc64: fix sparse warnings in init_64.cSam Ravnborg
Fix following warnings: init_64.c:191:10: warning: symbol 'dcpage_flushes' was not declared. Should it be static? init_64.c:193:10: warning: symbol 'dcpage_flushes_xcall' was not declared. Should it be static? Add extern declaration to asm/setup.h and drop local declaration in smp_64.h Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18sparc64: fix sparse warning in tsb.cSam Ravnborg
Fix following warning: tsb.c:290:5: warning: symbol 'sysctl_tsb_ratio' was not declared. Should it be static? Add extern declaration in asm/setup.h and remove local declaration in kernel/sysctl.c Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.cSam Ravnborg
Fix following warnings: kernel/sys_sparc_64.c:643:17: warning: symbol 'sys_kern_features' was not declared. Should it be static? kernel/unaligned_64.c:297:17: warning: symbol 'kernel_unaligned_trap' was not declared. Should it be static? kernel/unaligned_64.c:387:5: warning: symbol 'handle_popc' was not declared. Should it be static? kernel/unaligned_64.c:428:5: warning: symbol 'handle_ldf_stq' was not declared. Should it be static? kernel/unaligned_64.c:553:6: warning: symbol 'handle_ld_nf' was not declared. Should it be static? kernel/unaligned_64.c:579:6: warning: symbol 'handle_lddfmna' was not declared. Should it be static? kernel/unaligned_64.c:643:6: warning: symbol 'handle_stdfmna' was not declared. Should it be static? Functions that are only used in kernel/ - add prototypes in kernel.h Functions used outside kernel/ - add prototype in asm/setup.h Removed local prototypes One of the local prototypes had wrong signature (return void - not int). Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18sparc: drop use of extern for prototypes in arch/sparc/*Sam Ravnborg
Drop the remaining uses of extern for prototypes in .h files in the sparc specific part of the kernel tree. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18sparc32: fix sparse warning in io-unit.cSam Ravnborg
Fix following warning: io-unit.c:56:13: warning: incorrect type in assignment (different address spaces) The page table for the io unit resides in __iomem. Fix up all users of the io unit page table. Introduce sbus helers for all read/write operations. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18sparc32: fix sparse warning in iommu.cSam Ravnborg
Fix following warning: iommu.c:69:21: warning: incorrect type in assignment (different address spaces) iommu_struct.regs is __iomem - fix up all users. Introduce sbus operations for all read/write operations. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.David S. Miller
Access to the TSB hash tables during TLB misses requires that there be an atomic 128-bit quad load available so that we fetch a matching TAG and DATA field at the same time. On cpus prior to UltraSPARC-III only virtual address based quad loads are available. UltraSPARC-III and later provide physical address based variants which are easier to use. When we only have virtual address based quad loads available this means that we have to lock the TSB into the TLB at a fixed virtual address on each cpu when it runs that process. We can't just access the PAGE_OFFSET based aliased mapping of these TSBs because we cannot take a recursive TLB miss inside of the TLB miss handler without risking running out of hardware trap levels (some trap combinations can be deep, such as those generated by register window spill and fill traps). Without huge pages it's working perfectly fine, but when the huge TSB got added another chunk of fixed virtual address space was not allocated for this second TSB mapping. So we were mapping both the 8K and 4MB TSBs to the same exact virtual address, causing multiple TLB matches which gives undefined behavior. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-06sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault ↵David S. Miller
addresses. This was found using Dave Jone's trinity tool. When a user process which is 32-bit performs a load or a store, the cpu chops off the top 32-bits of the effective address before translating it. This is because we run 32-bit tasks with the PSTATE_AM (address masking) bit set. We can't run the kernel with that bit set, so when the kernel accesses userspace no address masking occurs. Since a 32-bit process will have no mappings in that region we will properly fault, so we don't try to handle this using access_ok(), which can safely just be a NOP on sparc64. Real faults from 32-bit processes should never generate such addresses so a bug check was added long ago, and it barks in the logs if this happens. But it also barks when a kernel user access causes this condition, and that _can_ happen. For example, if a pointer passed into a system call is "0xfffffffc" and the kernel access 4 bytes offset from that pointer. Just handle such faults normally via the exception entries. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03sparc64: Use 'ILOG2_4MB' instead of constant '22'.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03sparc64: Fix top-level fault handling bugs.David S. Miller
Make get_user_insn() able to cope with huge PMDs. Next, make do_fault_siginfo() more robust when get_user_insn() can't actually fetch the instruction. In particular, use the MMU announced fault address when that happens, instead of calling compute_effective_address() and computing garbage. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03sparc64: Fix bugs in get_user_pages_fast() wrt. THP.David S. Miller
The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to decide if it needs to bail and return 0. pmd_large() should therefore just check _PAGE_PMD_HUGE. Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we just need to add a valid bit check. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03sparc64: Fix huge PMD invalidation.David S. Miller
On sparc64 "present" and "valid" are seperate PTE bits, this allows us to naturally distinguish between the user explicitly asking for PROT_NONE with mprotect() and other situations. However we weren't handling this properly in the huge PMD paths. First of all, the page table walker in the TSB miss path only checks for _PAGE_PMD_HUGE. So the generic pmdp_invalidate() would clear _PAGE_PRESENT but the TLB miss paths would still load it into the TLB as a valid huge PMD. Fix this by clearing the valid bit in pmdp_invalidate(), and also checking the valid bit in USER_PGTABLE_CHECK_PMD_HUGE using "brgez" since _PAGE_VALID is bit 63 in both the sun4u and sun4v pte layouts. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03sparc64: Fix executable bit testing in set_pmd_at() paths.David S. Miller
This code was mistakenly using the exec bit from the PMD in all cases, even when the PMD isn't a huge PMD. If it's not a huge PMD, test the exec bit in the individual ptes down in tlb_batch_pmd_scan(). Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29sparc32: fix sparse warnings in unaligned_32.cSam Ravnborg
Fix following warnings: unaligned_32.c:146:15: warning: symbol 'safe_compute_effective_address' was not declared. Should it be static? unaligned_32.c:235:17: warning: symbol 'kernel_unaligned_trap' was not declared. Should it be static? unaligned_32.c:319:17: warning: symbol 'user_unaligned_trap' was not declared. Should it be static? Add proper declarations in kernel.h + setup.h Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29sparc32: fix sparse warning in devices.cSam Ravnborg
Fix following warning: devices.c:114:13: warning: symbol 'device_scan' was not declared. Should it be static? Add prototype to asm/setup.h Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29sparc32: fix sparse warnings in setup_32.cSam Ravnborg
Fix following warnings: setup_32.c:106:15: warning: symbol 'cmdline_memory_size' was not declared. Should it be static? setup_32.c:270:16: warning: symbol 'fake_swapper_regs' was not declared. Should it be static? setup_32.c:368:55: warning: Using plain integer as NULL pointer Add missing declaration of cmdline_memory_size and remove the local one in init_32.c fake_swapper_regs was only used locally - so defined static. When replacing 0 with NULL also add a few spaces around operators Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29sparc32: fix sparse "Should it be static?" in mm/Sam Ravnborg
Fix following warnings: srmmu.c:870:13: warning: symbol 'srmmu_paging_init' was not declared. Should it be static? iommu.c:430:13: warning: symbol 'ld_mmu_iommu' was not declared. Should it be static? leon_mm.c:21:5: warning: symbol 'srmmu_swprobe_trace' was not declared. Should it be static? Add proper prototypes or define static to fix them. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29sparc32: fix sparse warnings in srmmu.cSam Ravnborg
Fix following warnings: srmmu.c:78:5: warning: symbol 'flush_page_for_dma_global' was not declared. Should it be static? srmmu.c:85:5: warning: symbol 'viking_mxcc_present' was not declared. Should it be static? srmmu.c:103:6: warning: symbol 'srmmu_nocache_bitmap' was not declared. Should it be static? srmmu.c:176:24: warning: Using plain integer as NULL pointer srmmu.c:731:46: warning: Using plain integer as NULL pointer srmmu.c:731:46: warning: Using plain integer as NULL pointer srmmu.c:731:46: warning: Using plain integer as NULL pointer srmmu.c:870:13: warning: symbol 'srmmu_paging_init' was not declared. Should it be static? Add proper prototypes in mm_32.h and drop local prototype in init_32.c Replace 0 with NULL Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29sparc32: fix sparse warning in init_32.cSam Ravnborg
Fix following warning: init_32.c:112:22: warning: symbol 'bootmem_init' was not declared. Should it be static? Fix by adding a proper prototype in pgtable_32.h and drop the local prototype in srmmu.c Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29sparc32: fix sparse warning in fault_32.cSam Ravnborg
Fix following warning: fault_32.c:38:24: error: symbol 'unhandled_fault' redeclared with different type - different modifiers When this warning was fixed several new warnings popped up - fix them too. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29sparc32: rename mm/srmmu.h to mm/mm_32.hSam Ravnborg
This file will be used for more than just srmmu stuff, so the old name was misleading. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17sparc64:tsb.c:use array size macro rather than numberDoug Wilson
This is a small patch which uses ARRAY_SIZE macro rather than a number to make code readability better. Signed-off-by: Doug Wilson <doug.lkml@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19sparc32: make copy_to/from_user_page() usable from modular codePaul Gortmaker
While copy_to/from_user_page() users are uncommon, there is one in drivers/staging/lustre/lustre/libcfs/linux/linux-curproc.c which leads to the following: ERROR: "sparc32_cachetlb_ops" [drivers/staging/lustre/lustre/libcfs/libcfs.ko] undefined! during routine allmodconfig build coverage. The reason this happens is as follows: In arch/sparc/include/asm/cacheflush_32.h we have: #define flush_cache_page(vma,addr,pfn) \ sparc32_cachetlb_ops->cache_page(vma, addr) #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ do { \ flush_cache_page(vma, vaddr, page_to_pfn(page));\ memcpy(dst, src, len); \ } while (0) #define copy_from_user_page(vma, page, vaddr, dst, src, len) \ do { \ flush_cache_page(vma, vaddr, page_to_pfn(page));\ memcpy(dst, src, len); \ } while (0) However, sparc32_cachetlb_ops isn't exported and hence the error. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-28sparc: delete non-required instances of include <linux/init.h>Paul Gortmaker
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21memblock: make memblock_set_node() support different memblock_typeTang Chen
[sfr@canb.auug.org.au: fix powerpc build] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Liu Jiang <jiang.liu@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-18sparc64: merge fixStephen Rothwell
After merging the final tree, today's linux-next build (sparc64 defconfig) failed like this: arch/sparc/mm/init_64.c: In function 'pte_alloc_one': arch/sparc/mm/init_64.c:2568:9: error: unused variable 'pte' [-Werror=unused-variable] Caused by the merge between commit 37b3a8ff3e08 ("sparc64: Move from 4MB to 8MB huge pages") and commit 1ae9ae5f7df7 ("sparc: handle pgtable_page_ctor() fail") (I had the following merge fix in linux-next, but it didn't seem to propagate upstream - may have forgotten to point it out :-(). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextLinus Torvalds
Pull sparc update from David Miller: 1) Implement support for up to 47-bit physical addresses on sparc64. 2) Support HAVE_CONTEXT_TRACKING on sparc64, from Kirill Tkhai. 3) Fix Simba bridge window calculations, from Kjetil Oftedal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc64: Implement HAVE_CONTEXT_TRACKING sparc64: Add self-IPI support for smp_send_reschedule() sparc: PCI: Fix incorrect address calculation of PCI Bridge windows on Simba-bridges sparc64: Encode huge PMDs using PTE encoding. sparc64: Move to 64-bit PGDs and PMDs. sparc64: Move from 4MB to 8MB huge pages. sparc64: Make PAGE_OFFSET variable. sparc64: Fix inconsistent max-physical-address defines. sparc64: Document the shift counts used to validate linear kernel addresses. sparc64: Define PAGE_OFFSET in terms of physical address bits. sparc64: Use PAGE_OFFSET instead of a magic constant. sparc64: Clean up 64-bit mmap exclusion defines.
2013-11-15sparc: handle pgtable_page_ctor() failKirill A. Shutemov
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15mm, thp: do not access mm->pmd_huge_pte directlyKirill A. Shutemov
Currently mm->pmd_huge_pte protected by page table lock. It will not work with split lock. We have to have per-pmd pmd_huge_pte for proper access serialization. For now, let's just introduce wrapper to access mm->pmd_huge_pte. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-14sparc64: Implement HAVE_CONTEXT_TRACKINGKirill Tkhai
Mark the places when the system are in user or are in kernel. This is used to make full dynticks system (tickless) -- CONFIG_NO_HZ_FULL dependence. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-13sparc64: Encode huge PMDs using PTE encoding.David S. Miller
Now that we have 64-bits for PMDs we can stop using special encodings for the huge PMD values, and just put real PTEs in there. We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and huge ones. It is the same for both 4U and 4V PTE layouts. We also use _PAGE_SPECIAL to indicate the splitting state, since a huge PMD cannot also be special. All of the PMD --> PTE translation code disappears, and most of the huge PMD bit modifications and tests just degenerate into the PTE operations. In particular USER_PGTABLE_CHECK_PMD_HUGE becomes trivial. As a side effect, normal PMDs don't shift the physical address around. This also speeds up the page table walks in the TLB miss paths since they don't have to do the shifts any more. Another non-trivial aspect is that pte_modify() has to be changed to preserve the _PAGE_PMD_HUGE bits as well as the page size field of the pte. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-12sparc64: Move to 64-bit PGDs and PMDs.David S. Miller
To make the page tables compact, we were using 32-bit PGDs and PMDs. We only had to support <= 43 bits of physical addresses so this was quite feasible. In order to support larger physical addresses we have to move to 64-bit PGDs and PMDs. Most of the changes are straight-forward: 1) {pgd,pmd}_t --> unsigned long 2) Anything that tries to use plain "unsigned int" types with pgd/pmd values needs to be adjusted. In particular things like "0U" become "0UL". 3) {PGDIR,PMD}_BITS decrease by one. 4) In the assembler page table walkers, use "ldxa" instead of "lduwa" and adjust the low bit masks to clear out the low 3 bits instead of just the low 2 bits during pgd/pmd address formation. Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the swapper_{pg_dir,low_pmd_dir} arrays. This patch does not try to take advantage of having 64-bits in the PMDs to simplify the hugepage code, that will come in a subsequent change. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-12sparc64: Move from 4MB to 8MB huge pages.David S. Miller
The impetus for this is that we would like to move to 64-bit PMDs and PGDs, but that would result in only supporting a 42-bit address space with the current page table layout. It'd be nice to support at least 43-bits. The reason we'd end up with only 42-bits after making PMDs and PGDs 64-bit is that we only use half-page sized PTE tables in order to make PMDs line up to 4MB, the hardware huge page size we use. So what we do here is we make huge pages 8MB, and fabricate them using 4MB hw TLB entries. Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in places that really need to operate on hardware 4MB pages. Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT, PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up. This makes the pgtable cache completely unused, so remove the code managing it and the state used in mm_context_t. Now we have less spinlocks taken in the page table allocation path. The technique we use to fabricate the 8MB pages is to transfer bit 22 from the missing virtual address into the PTEs physical address field. That takes care of the transparent huge pages case. For hugetlb, we fill things in at the PTE level and that code already puts the sub huge page physical bits into the PTEs, based upon the offset, so there is nothing special we need to do. It all just works out. So, a small amount of complexity in the THP case, but this code is about to get much simpler when we move the 64-bit PMDs as we can move away from the fancy 32-bit huge PMD encoding and just put a real PTE value in there. With bug fixes and help from Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-12sparc64: Make PAGE_OFFSET variable.David S. Miller
Choose PAGE_OFFSET dynamically based upon cpu type. Original UltraSPARC-I (spitfire) chips only supported a 44-bit virtual address space. Newer chips (T4 and later) support 52-bit virtual addresses and up to 47-bits of physical memory space. Therefore we have to adjust PAGE_SIZE dynamically based upon the capabilities of the chip. Note that this change alone does not allow us to support > 43-bit physical memory, to do that we need to re-arrange our page table support. The current encodings of the pmd_t and pgd_t pointers restricts us to "32 + 11" == 43 bits. This change can waste quite a bit of memory for the various tables. In particular, a future change should work to size and allocate kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically. This isn't easy as we really cannot take a TLB miss when accessing kern_linear_bitmap[]. We'd have to lock it into the TLB or similar. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12sparc64: Document the shift counts used to validate linear kernel addresses.David S. Miller
This way we can see exactly what they are derived from, and in particular how they would change if we were to use a different PAGE_OFFSET value. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12sparc64: Use PAGE_OFFSET instead of a magic constant.David S. Miller
This pertains to all of the computations of the kernel fast TLB miss xor values. Based upon a patch by Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>