summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-02-19x86/microcode/intel: Guard against stack overflow in the loaderQuentin Casasnovas
mc_saved_tmp is a static array allocated on the stack, we need to make sure mc_saved_count stays within its bounds, otherwise we're overflowing the stack in _save_mc(). A specially crafted microcode header could lead to a kernel crash or potentially kernel execution. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1422964824-22056-1-git-send-email-quentin.casasnovas@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19Merge branch 'tip-x86-kaslr' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull ASLR and kASLR fixes from Borislav Petkov: - Add a global flag announcing KASLR state so that relevant code can do informed decisions based on its setting. (Jiri Kosina) - Fix a stack randomization entropy decrease bug. (Hector Marco-Gisbert) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19libceph: kfree() in put_osd() shouldn't depend on authorizerIlya Dryomov
a255651d4cad ("ceph: ensure auth ops are defined before use") made kfree() in put_osd() conditional on the authorizer. A mechanical mistake most likely - fix it. Cc: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19libceph: fix double __remove_osd() problemIlya Dryomov
It turns out it's possible to get __remove_osd() called twice on the same OSD. That doesn't sit well with rb_erase() - depending on the shape of the tree we can get a NULL dereference, a soft lockup or a random crash at some point in the future as we end up touching freed memory. One scenario that I was able to reproduce is as follows: <osd3 is idle, on the osd lru list> <con reset - osd3> con_fault_finish() osd_reset() <osdmap - osd3 down> ceph_osdc_handle_map() <takes map_sem> kick_requests() <takes request_mutex> reset_changed_osds() __reset_osd() __remove_osd() <releases request_mutex> <releases map_sem> <takes map_sem> <takes request_mutex> __kick_osd_requests() __reset_osd() __remove_osd() <-- !!! A case can be made that osd refcounting is imperfect and reworking it would be a proper resolution, but for now Sage and I decided to fix this by adding a safe guard around __remove_osd(). Fixes: http://tracker.ceph.com/issues/8087 Cc: Sage Weil <sage@redhat.com> Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular and lingering lists in __remove_osd() Cc: stable@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to WARN for __remove_osd() asserts Cc: stable@vger.kernel.org # 3.9+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19rbd: convert to blk-mqChristoph Hellwig
This converts the rbd driver to use the blk-mq infrastructure. Except for switching to a per-request work item this is almost mechanical. This was tested by Alexandre DERUMIER in November, and found to give him 120000 iops, although the only comparism available was an old 3.10 kernel which gave 80000iops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <elder@linaro.org> [idryomov@gmail.com: context, blk_mq_init_queue() EH] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-02-19x86, mm/ASLR: Fix stack randomization on 64-bit systemsHector Marco-Gisbert
The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19Merge branch 'tip-x86-mm' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull boot printout fix from Borislav Petkov. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19x86/mm/init: Fix incorrect page size in init_memory_mapping() printksDave Hansen
With 32-bit non-PAE kernels, we have 2 page sizes available (at most): 4k and 4M. Enabling PAE replaces that 4M size with a 2M one (which 64-bit systems use too). But, when booting a 32-bit non-PAE kernel, in one of our early-boot printouts, we say: init_memory_mapping: [mem 0x00000000-0x000fffff] [mem 0x00000000-0x000fffff] page 4k init_memory_mapping: [mem 0x37000000-0x373fffff] [mem 0x37000000-0x373fffff] page 2M init_memory_mapping: [mem 0x00100000-0x36ffffff] [mem 0x00100000-0x003fffff] page 4k [mem 0x00400000-0x36ffffff] page 2M init_memory_mapping: [mem 0x37400000-0x377fdfff] [mem 0x37400000-0x377fdfff] page 4k Which is obviously wrong. There is no 2M page available. This is probably because of a badly-named variable: in the map_range code: PG_LEVEL_2M. Instead of renaming all the PG_LEVEL_2M's. This patch just fixes the printout: init_memory_mapping: [mem 0x00000000-0x000fffff] [mem 0x00000000-0x000fffff] page 4k init_memory_mapping: [mem 0x37000000-0x373fffff] [mem 0x37000000-0x373fffff] page 4M init_memory_mapping: [mem 0x00100000-0x36ffffff] [mem 0x00100000-0x003fffff] page 4k [mem 0x00400000-0x36ffffff] page 4M init_memory_mapping: [mem 0x37400000-0x377fdfff] [mem 0x37400000-0x377fdfff] page 4k BRK [0x03206000, 0x03206fff] PGTABLE Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20150210212030.665EC267@viggo.jf.intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19x86/mm/ASLR: Propagate base load address calculationJiri Kosina
Commit: e2b32e678513 ("x86, kaslr: randomize module base load address") makes the base address for module to be unconditionally randomized in case when CONFIG_RANDOMIZE_BASE is defined and "nokaslr" option isn't present on the commandline. This is not consistent with how choose_kernel_location() decides whether it will randomize kernel load base. Namely, CONFIG_HIBERNATION disables kASLR (unless "kaslr" option is explicitly specified on kernel commandline), which makes the state space larger than what module loader is looking at. IOW CONFIG_HIBERNATION && CONFIG_RANDOMIZE_BASE is a valid config option, kASLR wouldn't be applied by default in that case, but module loader is not aware of that. Instead of fixing the logic in module.c, this patch takes more generic aproach. It introduces a new bootparam setup data_type SETUP_KASLR and uses that to pass the information whether kaslr has been applied during kernel decompression, and sets a global 'kaslr_enabled' variable accordingly, so that any kernel code (module loading, livepatching, ...) can make decisions based on its value. x86 module loader is converted to make use of this flag. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Kees Cook <keescook@chromium.org> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Link: https://lkml.kernel.org/r/alpine.LNX.2.00.1502101411280.10719@pobox.suse.cz [ Always dump correct kaslr status when panicking ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19ceph: return error for traceless reply raceYan, Zheng
When we receives traceless reply for request that created new inode, we re-send a lookup request to MDS get information of the newly created inode. (VFS expects FS' callback return an inode in create case) This breaks one request into two requests. Other client may modify or move to the new inode in the middle. When the race happens, ceph_handle_notrace_create() unconditionally links the dentry for 'create' operation to the inode returned by lookup. This may confuse VFS when the inode is a directory (VFS does not allow multiple linkages for directory inode). This patch makes ceph_handle_notrace_create() when it detect a race. This event should be rare and it happens only when we talk to old MDS. Recent MDS does not send traceless reply for request that creates new inode. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: fix dentry leaksYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: re-send requests when MDS enters reconnecting stageYan, Zheng
So that MDS can check if any request is already completed and process completed requests in clientreplay stage. When completed requests are processed in clientreplay stage, MDS can avoid sending traceless replies. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: show nocephx_require_signatures and notcp_nodelay optionsIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19libceph: tcp_nodelay supportChaitanya Huilgol
TCP_NODELAY socket option set on connection sockets, disables Nagle’s algorithm and improves latency characteristics. tcp_nodelay(default)/notcp_nodelay option flags provided to enable/disable setting the socket option. Signed-off-by: Chaitanya Huilgol <chaitanya.huilgol@sandisk.com> [idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19rbd: do not treat standalone as flattenIlya Dryomov
If the clone is resized down to 0, it becomes standalone. If such resize is carried over while an image is mapped we would detect this and call rbd_dev_parent_put() which means "let go of all parent state, including the spec(s) of parent images(s)". This leads to a mismatch between "rbd info" and sysfs parent fields, so a fix is in order. # rbd create --image-format 2 --size 1 foo # rbd snap create foo@snap # rbd snap protect foo@snap # rbd clone foo@snap bar # DEV=$(rbd map bar) # rbd resize --allow-shrink --size 0 bar # rbd resize --size 1 bar # rbd info bar | grep parent parent: rbd/foo@snap Before: # cat /sys/bus/rbd/devices/0/parent (no parent image) After: # cat /sys/bus/rbd/devices/0/parent pool_id 0 pool_name rbd image_id 10056b8b4567 image_name foo snap_id 2 snap_name snap overlap 0 Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19ceph: fix atomic_open snapdirYan, Zheng
ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value and creates snapdir inode if it's -ENOENT Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: properly mark empty directory as completeYan, Zheng
ceph_add_cap() calls __check_cap_issue(), which clears directory inode' complete flag. so we should set the complete flag for empty directory should be set after calling ceph_add_cap(). Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19client: include kernel version in client metadataYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: provide seperate {inode,file}_operations for snapdirYan, Zheng
remove all unsupported operations from {inode,file}_operations. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: fix request time stamp encodingYan, Zheng
struct timespec uses 'long' to present second and nanosecond. 'long' is 64 bits on 64bits machine. ceph MDS expects time stamp to be encoded as struct ceph_timespec, which uses 'u32' to present second and nanosecond. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: fix reading inline data when i_size > PAGE_SIZEYan, Zheng
when inode has inline data but its size > PAGE_SIZE (it was truncated to larger size), previous direct read code return -EIO. This patch adds code to return zeros for data whose offset > PAGE_SIZE. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)Yan, Zheng
use an atomic variable to track number of sessions, this can avoid block operation inside wait loops. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)Yan, Zheng
we should not do block operation in wait_event_interruptible()'s condition check function, but reading inline data can block. so move the read inline data code to ceph_get_caps() Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)Yan, Zheng
check_cap_flush() calls mutex_lock(), which may block. So we can't use it as condition check function for wait_event(); Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19rbd: fix error paths in rbd_dev_refresh()Ilya Dryomov
header_rwsem should be released on errors. Also remove useless rbd_dev->mapping.size != rbd_dev->header.image_size test. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19ceph: improve reference tracking for snaprealmYan, Zheng
When snaprealm is created, its initial reference count is zero. But in some rare cases, the newly created snaprealm is not referenced by anyone. This causes snaprealm with zero reference count not freed. The fix is set reference count of newly snaprealm to 1. The reference is return the function who requests to create the snaprealm. When the function finishes its job, it releases the reference. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: properly zero data pages for file holes.Yan, Zheng
A bug is found in striped_read() of fs/ceph/file.c. striped_read() calls ceph_zero_pape_vector_range(). The first argument, page_align + read + ret, passed to ceph_zero_pape_vector_range() is wrong. When a file has holes, this wrong parameter may cause memory corruption either in kernal space or user space. Kernel space memory may be corrupted in the case of non direct IO; user space memory may be corrupted in the case of direct IO. In the latter case, the application doing direct IO may crash due to memory corruption, as we have experienced. The correct value should be initial_align + read + ret, where intial_align = o_direct ? buf_align : io_align. Compared with page_align, the current page offest, initial_align is the initial page offest, which should be used to calculate the page and offset in ceph_zero_pape_vector_range(). Reported-by: caifeng zhu <zhucaifeng@unissoft-nj.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: acl: Remove unused functionRickard Strandqvist
Remove the function ceph_get_cached_acl() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2015-02-19rbd: nuke copy_token()Rickard Strandqvist
It's been largely superseded by dup_token() and unused for over 2 years, identified by cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> [idryomov@redhat.com: changelog] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19ceph: handle SESSION_FORCE_RO messageYan, Zheng
mark session as readonly and wake up all cap waiters. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19libceph: use mon_client.c/put_generic_request() moreIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19libceph: nuke pool op infrastructureIlya Dryomov
On Mon, Dec 22, 2014 at 5:35 PM, Sage Weil <sage@newdream.net> wrote: > On Mon, 22 Dec 2014, Ilya Dryomov wrote: >> Actually, pool op stuff has been unused for over two years - looks like >> it was added for rbd create_snap and that got ripped out in 2012. It's >> unlikely we'd ever need to manage pools or snaps from the kernel client >> so I think it makes sense to nuke it. Sage? > > Yep! Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19s390/mm: align 64-bit PIE binaries to 4GBMartin Schwidefsky
The base address (STACK_TOP / 3 * 2) for a 64-bit program is two thirds into the 4GB segment at 0x2aa00000000. The randomization added on z13 can eat another 1GB of the remaining 1.33GB to the next 4GB boundary. In the worst case 300MB are left for the executable + bss which may cross into the next 4GB segment. This is bad for branch prediction, therefore align the base address to 4GB to give the program more room before it crosses the 4GB boundary. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-02-19x86/intel/quark: Fix simple_return.cocci warningsFengguang Wu
arch/x86/platform/intel-quark/imr.c:129:1-4: WARNING: end returns can be simpified Simplify a trivial if-return sequence. Possibly combine with a preceding function call. Generated by: scripts/coccinelle/misc/simple_return.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Andy Shevchenko <andy.schevchenko@gmail.com> Cc: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Darren Hart <dvhart@linux.intel.com> Cc: kbuild-all@01.org Link: http://lkml.kernel.org/r/20150219081432.GA21996@waimea Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19x86/intel/quark: Fix ptr_ret.cocci warningsFengguang Wu
arch/x86/platform/intel-quark/imr.c:280:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Andy Shevchenko <andy.schevchenko@gmail.com> Cc: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Darren Hart <dvhart@linux.intel.com> Cc: kbuild-all@01.org Link: http://lkml.kernel.org/r/20150219081432.GA21983@waimea Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19ARM: make of_device_ids constUwe Kleine-König
of_device_ids (i.e. compatible strings and the respective data) are not supposed to change at runtime. All functions working with of_device_ids provided by <linux/of.h> work with const of_device_ids. So mark the non-const structs in arch/arm as const, too. While at it also add some __initconst annotations. Acked-by: Jason Cooper <jason@lakedameon.net> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-02-19ARM: make arrays containing machine compatible strings constUwe Kleine-König
The definition static const char *axxia_dt_match[] __initconst = { ... defines a changable array of constant strings. That is you must not do: *axxia_dt_match[0] = 'k'; but axxia_dt_match[0] = "different string"; is fine. So the annotation __initconst is wrong and yields a compiler error when other really const variables are added with __initconst. As the struct machine_desc member dt_compat is declared as const char *const *dt_compat; making the arrays const is the better alternative over changing all annotations to __initdata. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-02-19cpufreq: s3c: remove last use of resume_clocks callbackArnd Bergmann
Commit 32726d2d550 ("ARM: SAMSUNG: Remove legacy clock code") already removed the callback pointer, but there was one remaining user: drivers/cpufreq/s3c24xx-cpufreq.c: In function 's3c_cpufreq_resume_clocks': drivers/cpufreq/s3c24xx-cpufreq.c:149:14: error: 'struct s3c_cpufreq_info' has no member named 'resume_clocks' cpu_cur.info->resume_clocks(); ^ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 32726d2d550 ("ARM: SAMSUNG: Remove legacy clock code") Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.17+ <stable@vger.kernel.org> # v3.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-19cpufreq: s3c: remove incorrect __init annotationsArnd Bergmann
The two functions s3c2416_cpufreq_driver_init and s3c_cpufreq_register are marked init but are called from a context that might be run after the __init sections are discarded, as the compiler points out: WARNING: vmlinux.o(.data+0x1ad9dc): Section mismatch in reference from the variable s3c2416_cpufreq_driver to the function .init.text:s3c2416_cpufreq_driver_init() WARNING: drivers/built-in.o(.text+0x35b5dc): Section mismatch in reference from the function s3c2410a_cpufreq_add() to the function .init.text:s3c_cpufreq_register() This removes the __init markings. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-18x86/intel/quark: Add Intel Quark platform supportBryan O'Donoghue
Add Intel Quark platform support. Quark needs to pull down all unlocked IMRs to ensure agreement with the EFI memory map post boot. This patch adds an entry in Kconfig for Quark as a platform and makes IMR support mandatory if selected. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Ong, Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Reviewed-by: Andy Shevchenko <andy.schevchenko@gmail.com> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: dvhart@infradead.org Link: http://lkml.kernel.org/r/1422635379-12476-3-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18x86/intel/quark: Add Isolated Memory Regions for Quark X1000Bryan O'Donoghue
Intel's Quark X1000 SoC contains a set of registers called Isolated Memory Regions. IMRs are accessed over the IOSF mailbox interface. IMRs are areas carved out of memory that define read/write access rights to the various system agents within the Quark system. For a given agent in the system it is possible to specify if that agent may read or write an area of memory defined by an IMR with a granularity of 1 KiB. Quark_SecureBootPRM_330234_001.pdf section 4.5 details the concept of IMRs quark-x1000-datasheet.pdf section 12.7.4 details the implementation of IMRs in silicon. eSRAM flush, CPU Snoop write-only, CPU SMM Mode, CPU non-SMM mode, RMU and PCIe Virtual Channels (VC0 and VC1) can have individual read/write access masks applied to them for a given memory region in Quark X1000. This enables IMRs to treat each memory transaction type listed above on an individual basis and to filter appropriately based on the IMR access mask for the memory region. Quark supports eight IMRs. Since all of the DMA capable SoC components in the X1000 are mapped to VC0 it is possible to define sections of memory as invalid for DMA write operations originating from Ethernet, USB, SD and any other DMA capable south-cluster component on VC0. Similarly it is possible to mark kernel memory as non-SMM mode read/write only or to mark BIOS runtime memory as SMM mode accessible only depending on the particular memory footprint on a given system. On an IMR violation Quark SoC X1000 systems are configured to reset the system, so ensuring that the IMR memory map is consistent with the EFI provided memory map is critical to ensure no IMR violations reset the system. The API for accessing IMRs is based on MTRR code but doesn't provide a /proc or /sys interface to manipulate IMRs. Defining the size and extent of IMRs is exclusively the domain of in-kernel code. Quark firmware sets up a series of locked IMRs around pieces of memory that firmware owns such as ACPI runtime data. During boot a series of unlocked IMRs are placed around items in memory to guarantee no DMA modification of those items can take place. Grub also places an unlocked IMR around the kernel boot params data structure and compressed kernel image. It is necessary for the kernel to tear down all unlocked IMRs in order to ensure that the kernel's view of memory passed via the EFI memory map is consistent with the IMR memory map. Without tearing down all unlocked IMRs on boot transitory IMRs such as those used to protect the compressed kernel image will cause IMR violations and system reboots. The IMR init code tears down all unlocked IMRs and sets a protective IMR around the kernel .text and .rodata as one contiguous block. This sanitizes the IMR memory map with respect to the EFI memory map and protects the read-only portions of the kernel from unwarranted DMA access. Tested-by: Ong, Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Reviewed-by: Andy Shevchenko <andy.schevchenko@gmail.com> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: andy.shevchenko@gmail.com Cc: dvhart@infradead.org Link: http://lkml.kernel.org/r/1422635379-12476-2-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18Documentation/x86: Fix path in zero-page.txtAlexander Kuleshov
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Martin Mares <mj@ucw.cz> Link: http://lkml.kernel.org/r/1422689004-13318-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18x86/apic: Fix the devicetree build in certain configsRicardo Ribalda Delgado
Without this patch: LD init/built-in.o arch/x86/built-in.o: In function `dtb_lapic_setup': kernel/devicetree.c:155: undefined reference to `apic_force_enable' Makefile:923: recipe for target 'vmlinux' failed make: *** [vmlinux] Error 1 Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: David Rientjes <rientjes@google.com> Cc: Jan Beulich <JBeulich@suse.com> Link: http://lkml.kernel.org/r/1422905231-16067-1-git-send-email-ricardo.ribalda@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18NFSv4.1: Clean up bind_conn_to_sessionTrond Myklebust
We don't need to fake up an entire session in order retrieve the arguments. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-18kprobes/x86: Mark 2 bytes NOP as boostableWang Nan
Currently, x86 kprobes is unable to boost 2 bytes nop like: nopl 0x0(%rax,%rax,1) which is 0x0f 0x1f 0x44 0x00 0x00. Such nops have exactly 5 bytes to hold a relative jmp instruction. Boosting them should be obviously safe. This patch enable boosting such nops by simply updating twobyte_is_boostable[] array. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1423532045-41049-1-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18NFSv4.1: Always set up a forward channel when binding the sessionTrond Myklebust
Currently, the client requests a back channel or a bidirectional connection when binding a new TCP channel to an existing session. Fix that to ask for a forward channel or bidirectional. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-18NFSv4.1: Don't set up a backchannel if the server didn't agree to do soTrond Myklebust
If the server doesn't agree to out backchannel setup request, then don't set one up. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-18NFSv4.1: Clean up create_sessionTrond Myklebust
Don't decode directly into the shared struct session Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-18uprobes/x86: Fix 2-byte opcode tableDenys Vlasenko
Enabled probing of lar, lsl, popcnt, lddqu, prefetch insns. They should be safe to probe, they throw no exceptions. Enabled probing of 3-byte opcodes 0f 38-3f xx - these are vector isns, so should be safe. Enabled probing of many currently undefined 0f xx insns. At the rate new vector instructions are getting added, we don't want to constantly enable more bits. We want to only occasionally *disable* ones which for some reason can't be probed. This includes 0f 24,26 opcodes, which are undefined since Pentium. On 486, they were "mov to/from test register". Explained more fully what 0f 78,79 opcodes are. Explained what 0f ae opcode is. (It's unclear why we don't allow probing it, but let's not change it for now). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1423768732-32194-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18uprobes/x86: Fix 1-byte opcode tablesDenys Vlasenko
This change fixes 1-byte opcode tables so that only insns for which we have real reasons to disallow probing are marked with unset bits. To that end: Set bits for all prefix bytes. Their setting is ignored anyway - we check the bitmap against OPCODE1(insn), not against first byte. Keeping them set to 0 only confuses code reader with "why we don't support that opcode" question. Thus: enable bytes c4,c5 in 64-bit mode (VEX prefixes). Byte 62 (EVEX prefix) is not yet enabled since insn decoder does not support that yet. For 32-bit mode, enable probing of opcodes 63 (arpl) and d6 (salc). They don't require any special handling. For 64-bit mode, disable 9a and ea - these undefined opcodes were mistakenly left enabled. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1423768732-32194-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>