summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/uprobes.c
AgeCommit message (Collapse)Author
2015-06-09x86: Make is_64bit_mm() widely availableDave Hansen
The uprobes code has a nice helper, is_64bit_mm(), that consults both the runtime and compile-time flags for 32-bit support. Instead of reinventing the wheel, pull it in to an x86 header so we can use it for MPX. I prefer passing the 'mm' around to test_thread_flag(TIF_IA32) because it makes it explicit where the context is coming from. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.F0209999@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()'Andy Lutomirski
user_mode_vm() and user_mode() are now the same. Change all callers of user_mode_vm() to user_mode(). The next patch will remove the definition of user_mode_vm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org [ Merged to a more recent kernel. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18uprobes/x86: Fix 2-byte opcode tableDenys Vlasenko
Enabled probing of lar, lsl, popcnt, lddqu, prefetch insns. They should be safe to probe, they throw no exceptions. Enabled probing of 3-byte opcodes 0f 38-3f xx - these are vector isns, so should be safe. Enabled probing of many currently undefined 0f xx insns. At the rate new vector instructions are getting added, we don't want to constantly enable more bits. We want to only occasionally *disable* ones which for some reason can't be probed. This includes 0f 24,26 opcodes, which are undefined since Pentium. On 486, they were "mov to/from test register". Explained more fully what 0f 78,79 opcodes are. Explained what 0f ae opcode is. (It's unclear why we don't allow probing it, but let's not change it for now). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1423768732-32194-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18uprobes/x86: Fix 1-byte opcode tablesDenys Vlasenko
This change fixes 1-byte opcode tables so that only insns for which we have real reasons to disallow probing are marked with unset bits. To that end: Set bits for all prefix bytes. Their setting is ignored anyway - we check the bitmap against OPCODE1(insn), not against first byte. Keeping them set to 0 only confuses code reader with "why we don't support that opcode" question. Thus: enable bytes c4,c5 in 64-bit mode (VEX prefixes). Byte 62 (EVEX prefix) is not yet enabled since insn decoder does not support that yet. For 32-bit mode, enable probing of opcodes 63 (arpl) and d6 (salc). They don't require any special handling. For 64-bit mode, disable 9a and ea - these undefined opcodes were mistakenly left enabled. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1423768732-32194-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18uprobes/x86: Add comment with insn opcodes, mnemonics and why we dont ↵Denys Vlasenko
support them After adding these, it's clear we have some awkward choices there. Some valid instructions are prohibited from uprobing while several invalid ones are allowed. Hopefully future edits to the good-opcode tables will fix wrong bits or explain why those bits are not wrong. No actual code changes. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1423768732-32194-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-18x86: Remove arbitrary instruction size limit in instruction decoderDave Hansen
The current x86 instruction decoder steps along through the instruction stream but always ensures that it never steps farther than the largest possible instruction size (MAX_INSN_SIZE). The MPX code is now going to be doing some decoding of userspace instructions. We copy those from userspace in to the kernel and they're obviously completely untrusted coming from userspace. In addition to the constraint that instructions can only be so long, we also have to be aware of how long the buffer is that came in from userspace. This _looks_ to be similar to what the perf and kprobes is doing, but it's unclear to me whether they are affected. The whole reason we need this is that it is perfectly valid to be executing an instruction within MAX_INSN_SIZE bytes of an unreadable page. We should be able to gracefully handle short reads in those cases. This adds support to the decoder to record how long the buffer being decoded is and to refuse to "validate" the instruction if we would have gone over the end of the buffer to decode it. The kprobes code probably needs to be looked at here a bit more carefully. This patch still respects the MAX_INSN_SIZE limit there but the kprobes code does look like it might be able to be a bit more strict than it currently is. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: x86@kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-05uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updatesOleg Nesterov
Purely cosmetic, no changes in .o, 1. As Jim pointed out arch_uprobe->def looks ambiguous, rename it to ->defparam. 2. Add the comment into default_post_xol_op() to explain "regs->sp +=". 3. Remove the stale part of the comment in arch_uprobe_analyze_insn(). Suggested-by: Jim Keniston <jkenisto@us.ibm.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14uprobes/x86: Fix scratch register selection for rip-relative fixupsDenys Vlasenko
Before this patch, instructions such as div, mul, shifts with count in CL, cmpxchg are mishandled. This patch adds vex prefix handling. In particular, it avoids colliding with register operand encoded in vex.vvvv field. Since we need to avoid two possible register operands, the selection of scratch register needs to be from at least three registers. After looking through a lot of CPU docs, it looks like the safest choice is SI,DI,BX. Selecting BX needs care to not collide with implicit use of BX by cmpxchg8b. Test-case: #include <stdio.h> static const char *const pass[] = { "FAIL", "pass" }; long two = 2; void test1(void) { long ax = 0, dx = 0; asm volatile("\n" " xor %%edx,%%edx\n" " lea 2(%%edx),%%eax\n" // We divide 2 by 2. Result (in eax) should be 1: " probe1: .globl probe1\n" " divl two(%%rip)\n" // If we have a bug (eax mangled on entry) the result will be 2, // because eax gets restored by probe machinery. : "=a" (ax), "=d" (dx) /*out*/ : "0" (ax), "1" (dx) /*in*/ : "memory" /*clobber*/ ); dprintf(2, "%s: %s\n", __func__, pass[ax == 1] ); } long val2 = 0; void test2(void) { long old_val = val2; long ax = 0, dx = 0; asm volatile("\n" " mov val2,%%eax\n" // eax := val2 " lea 1(%%eax),%%edx\n" // edx := eax+1 // eax is equal to val2. cmpxchg should store edx to val2: " probe2: .globl probe2\n" " cmpxchg %%edx,val2(%%rip)\n" // If we have a bug (eax mangled on entry), val2 will stay unchanged : "=a" (ax), "=d" (dx) /*out*/ : "0" (ax), "1" (dx) /*in*/ : "memory" /*clobber*/ ); dprintf(2, "%s: %s\n", __func__, pass[val2 == old_val + 1] ); } long val3[2] = {0,0}; void test3(void) { long old_val = val3[0]; long ax = 0, dx = 0; asm volatile("\n" " mov val3,%%eax\n" // edx:eax := val3 " mov val3+4,%%edx\n" " mov %%eax,%%ebx\n" // ecx:ebx := edx:eax + 1 " mov %%edx,%%ecx\n" " add $1,%%ebx\n" " adc $0,%%ecx\n" // edx:eax is equal to val3. cmpxchg8b should store ecx:ebx to val3: " probe3: .globl probe3\n" " cmpxchg8b val3(%%rip)\n" // If we have a bug (edx:eax mangled on entry), val3 will stay unchanged. // If ecx:edx in mangled, val3 will get wrong value. : "=a" (ax), "=d" (dx) /*out*/ : "0" (ax), "1" (dx) /*in*/ : "cx", "bx", "memory" /*clobber*/ ); dprintf(2, "%s: %s\n", __func__, pass[val3[0] == old_val + 1 && val3[1] == 0] ); } int main(int argc, char **argv) { test1(); test2(); test3(); return 0; } Before this change all tests fail if probe{1,2,3} are probed. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14uprobes/x86: Simplify rip-relative handlingDenys Vlasenko
It is possible to replace rip-relative addressing mode with addressing mode of the same length: (reg+disp32). This eliminates the need to fix up immediate and correct for changing instruction length. And we can kill arch_uprobe->def.riprel_target. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Simplify riprel_{pre,post}_xol() and make them similarOleg Nesterov
Ignoring the "correction" logic riprel_pre_xol() and riprel_post_xol() are very similar but look quite differently. 1. Add the "UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX" check at the start of riprel_pre_xol(), like the same check in riprel_post_xol(). 2. Add the trivial scratch_reg() helper which returns the address of scratch register pre_xol/post_xol need to change. 3. Change these functions to use the new helper and avoid copy-and-paste under if/else branches. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Kill the "autask" arg of riprel_pre_xol()Oleg Nesterov
default_pre_xol_op() passes &current->utask->autask to riprel_pre_xol() and this is just ugly because it still needs to load current->utask to read ->vaddr. Remove this argument, change riprel_pre_xol() to use current->utask. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Rename *riprel* helpers to make the naming consistentOleg Nesterov
handle_riprel_insn(), pre_xol_rip_insn() and handle_riprel_post_xol() look confusing and inconsistent. Rename them into riprel_analyze(), riprel_pre_xol(), and riprel_post_xol() respectively. No changes in compiled code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Cleanup the usage of UPROBE_FIX_IP/UPROBE_FIX_CALLOleg Nesterov
Now that UPROBE_FIX_IP/UPROBE_FIX_CALL are mutually exclusive we can use a single "fix_ip_or_call" enum instead of 2 fix_* booleans. This way the logic looks more understandable and clean to me. While at it, join "case 0xea" with other "ip is correct" ret/lret cases. Also change default_post_xol_op() to use "else if" for the same reason. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Kill adjust_ret_addr(), simplify UPROBE_FIX_CALL logicOleg Nesterov
The only insn which could have both UPROBE_FIX_IP and UPROBE_FIX_CALL was 0xe8 "call relative", and now it is handled by branch_xol_ops. So we can change default_post_xol_op(UPROBE_FIX_CALL) to simply push the address of next insn == utask->vaddr + insn.length, just we need to record insn.length into the new auprobe->def.ilen member. Note: if/when we teach branch_xol_ops to support jcxz/loopz we can remove the "correction" logic, UPROBE_FIX_IP can use the same address. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Introduce push_ret_address()Oleg Nesterov
Extract the "push return address" code from branch_emulate_op() into the new simple helper, push_ret_address(). It will have more users. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Cleanup the usage of arch_uprobe->def.fixups, make it u8Oleg Nesterov
handle_riprel_insn() assumes that nobody else could modify ->fixups before. This is correct but fragile, change it to use "|=". Also make ->fixups u8, we are going to add the new members into the union. It is not clear why UPROBE_FIX_RIP_.X lived in the upper byte, redefine them so that they can fit into u8. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Move default_xol_ops's data into arch_uprobe->defOleg Nesterov
Finally we can move arch_uprobe->fixups/rip_rela_target_address into the new "def" struct and place this struct in the union, they are only used by default_xol_ops paths. The patch also renames rip_rela_target_address to riprel_target just to make this name shorter. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Move UPROBE_FIX_SETF logic from arch_uprobe_post_xol() to ↵Oleg Nesterov
default_post_xol_op() UPROBE_FIX_SETF is only needed to handle "popf" correctly but it is processed by the generic arch_uprobe_post_xol() code. This doesn't allows us to make ->fixups private for default_xol_ops. 1 Change default_post_xol_op(UPROBE_FIX_SETF) to set ->saved_tf = T. "popf" always reads the flags from stack, it doesn't matter if TF was set or not before single-step. Ignoring the naming, this is even more logical, "saved_tf" means "owned by application" and we do not own this flag after "popf". 2. Change arch_uprobe_post_xol() to save ->saved_tf into the local "bool send_sigtrap" before ->post_xol(). 3. Change arch_uprobe_post_xol() to ignore UPROBE_FIX_SETF and just check ->saved_tf after ->post_xol(). With this patch ->fixups and ->rip_rela_target_address are only used by default_xol_ops hooks, we are ready to remove them from the common part of arch_uprobe. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Don't use arch_uprobe_abort_xol() in arch_uprobe_post_xol()Oleg Nesterov
014940bad8e4 "uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails" changed arch_uprobe_post_xol() to use arch_uprobe_abort_xol() if ->post_xol fails. This was correct and helped to avoid the additional complications, we need to clear X86_EFLAGS_TF in this case. However, now that we have uprobe_xol_ops->abort() hook it would be better to avoid arch_uprobe_abort_xol() here. ->post_xol() should likely do what ->abort() does anyway, we should not do the same work twice. Currently only handle_riprel_post_xol() can be called twice, this is unnecessary but safe. Still this is not clean and can lead to the problems in future. Change arch_uprobe_post_xol() to clear X86_EFLAGS_TF and restore ->ip by hand and avoid arch_uprobe_abort_xol(). This temporary uglifies the usage of autask.saved_tf, we will cleanup this later. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Introduce uprobe_xol_ops->abort() and default_abort_op()Oleg Nesterov
arch_uprobe_abort_xol() calls handle_riprel_post_xol() even if auprobe->ops != default_xol_ops. This is fine correctness wise, only default_pre_xol_op() can set UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX and otherwise handle_riprel_post_xol() is nop. But this doesn't look clean and this doesn't allow us to move ->fixups into the union in arch_uprobe. Move this handle_riprel_post_xol() call into the new default_abort_op() hook and change arch_uprobe_abort_xol() accordingly. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Don't change the task's state if ->pre_xol() failsOleg Nesterov
Currently this doesn't matter, the only ->pre_xol() hook can't fail, but we need to fix arch_uprobe_pre_xol() anyway. If ->pre_xol() fails we should not change regs->ip/flags, we should just return the error to make restart actually possible. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Fix is_64bit_mm() with CONFIG_X86_X32Oleg Nesterov
is_64bit_mm() assumes that mm->context.ia32_compat means the 32-bit instruction set, this is not true if the task is TIF_X32. Change set_personality_ia32() to initialize mm->context.ia32_compat by TIF_X32 or TIF_IA32 instead of 1. This allows to fix is_64bit_mm() without affecting other users, they all treat ia32_compat as "bool". TIF_ in ->ia32_compat looks a bit strange, but this is grep-friendly and avoids the new define's. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Make good_insns_* depend on CONFIG_X86_*Oleg Nesterov
Add the suitable ifdef's around good_insns_* arrays. We do not want to add the ugly ifdef's into their only user, uprobe_init_insn(), so the "#else" branch simply defines them as NULL. This doesn't generate the extra code, gcc is smart enough, although the code is fine even if it could not detect that (without CONFIG_IA32_EMULATION) is_64bit_mm() is __builtin_constant_p(). The patch looks more complicated because it also moves good_insns_64 up close to good_insns_32. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Shift "insn_complete" from branch_setup_xol_ops() to ↵Oleg Nesterov
uprobe_init_insn() Change uprobe_init_insn() to make insn_complete() == T, this makes other insn_get_*() calls unnecessary. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Add is_64bit_mm(), kill validate_insn_bits()Oleg Nesterov
1. Extract the ->ia32_compat check from 64bit validate_insn_bits() into the new helper, is_64bit_mm(), it will have more users. TODO: this checks is actually wrong if mm owner is X32 task, we need another fix which changes set_personality_ia32(). TODO: even worse, the whole 64-or-32-bit logic is very broken and the fix is not simple, we need the nontrivial changes in the core uprobes code. 2. Kill validate_insn_bits() and change its single caller to use uprobe_init_insn(is_64bit_mm(mm). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Add uprobe_init_insn(), kill validate_insn_{32,64}bits()Oleg Nesterov
validate_insn_32bits() and validate_insn_64bits() are very similar, turn them into the single uprobe_init_insn() which has the additional "bool x86_64" argument which can be passed to insn_init() and used to choose between good_insns_64/good_insns_32. Also kill UPROBE_FIX_NONE, it has no users. Note: the current code doesn't use ifdef's consistently, good_insns_64 depends on CONFIG_X86_64 but good_insns_32 is unconditional. This patch removes ifdef around good_insns_64, we will add it back later along with the similar one for good_insns_32. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Refuse to attach uprobe to "word-sized" branch insnsDenys Vlasenko
All branch insns on x86 can be prefixed with the operand-size override prefix, 0x66. It was only ever useful for performing jumps to 32-bit offsets in 16-bit code segments. In 32-bit code, such instructions are useless since they cause IP truncation to 16 bits, and in case of call insns, they save only 16 bits of return address and misalign the stack pointer as a "bonus". In 64-bit code, such instructions are treated differently by Intel and AMD CPUs: Intel ignores the prefix altogether, AMD treats them the same as in 32-bit mode. Before this patch, the emulation code would execute the instructions as if they have no 0x66 prefix. With this patch, we refuse to attach uprobes to such insns. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-17uprobes/x86: Emulate relative conditional "near" jmp'sOleg Nesterov
Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10 if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same condition. Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are correct no matter if this jmp is "near" or "short". Reported-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Emulate relative conditional "short" jmp'sOleg Nesterov
Teach branch_emulate_op() to emulate the conditional "short" jmp's which check regs->flags. Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz. They all are rel8 and thus they can't trigger the problem, but perhaps we will add the support in future just for completeness. Reported-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Emulate relative call'sOleg Nesterov
See the previous "Emulate unconditional relative jmp's" which explains why we can not execute "jmp" out-of-line, the same applies to "call". Emulating of rip-relative call is trivial, we only need to additionally push the ret-address. If this fails, we execute this instruction out of line and this should trigger the trap, the probed application should die or the same insn will be restarted if a signal handler expands the stack. We do not even need ->post_xol() for this case. But there is a corner (and almost theoretical) case: another thread can expand the stack right before we execute this insn out of line. In this case it hit the same problem we are trying to solve. So we simply turn the probed insn into "call 1f; 1:" and add ->post_xol() which restores ->sp and restarts. Many thanks to Jonathan who finally found the standalone reproducer, otherwise I would never resolve the "random SIGSEGV's under systemtap" bug-report. Now that the problem is clear we can write the simplified test-case: void probe_func(void), callee(void); int failed = 1; asm ( ".text\n" ".align 4096\n" ".globl probe_func\n" "probe_func:\n" "call callee\n" "ret" ); /* * This assumes that: * * - &probe_func = 0x401000 + a_bit, aligned = 0x402000 * * - xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000 * as xol_add_vma() asks; the 1st slot = 0x7fffffffe080 * * so we can target the non-canonical address from xol_vma using * the simple math below, 100 * 4096 is just the random offset */ asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1 + 100 * 4096\n"); void callee(void) { failed = 0; } int main(void) { probe_func(); return failed; } It SIGSEGV's if you probe "probe_func" (although this is not very reliable, randomize_va_space/etc can change the placement of xol area). Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8) differently. This patch relies on lib/insn.c and thus implements the intel's behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use this insn, so we postpone the fix until we decide what should we do; emulate or not, support or not, etc. Reported-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Emulate nop's using ops->emulate()Oleg Nesterov
Finally we can kill the ugly (and very limited) code in __skip_sstep(). Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn. Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes "(rep;)+ nop;" at least, and (afaics) much more. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Emulate unconditional relative jmp'sOleg Nesterov
Currently we always execute all insns out-of-line, including relative jmp's and call's. This assumes that even if regs->ip points to nowhere after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will update it correctly. However, this doesn't work if this regs->ip == xol_vaddr + insn_offset is not canonical. In this case CPU generates #GP and general_protection() kills the task which tries to execute this insn out-of-line. Now that we have uprobe_xol_ops we can teach uprobes to emulate these insns and solve the problem. This patch adds branch_xol_ops which has a single branch_emulate_op() hook, so far it can only handle rel8/32 relative jmp's. TODO: move ->fixup into the union along with rip_rela_target_address. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jonathan Lebon <jlebon@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and ↵Oleg Nesterov
arch_uretprobe_hijack_return_addr() 1. Add the trivial sizeof_long() helper and change other callers of is_ia32_task() to use it. TODO: is_ia32_task() is not what we actually want, TS_COMPAT does not necessarily mean 32bit. Fortunately syscall-like insns can't be probed so it actually works, but it would be better to rename and use is_ia32_frame(). 2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr() and adjust_ret_addr() should be named "nleft". And in fact only the last copy_to_user() in arch_uretprobe_hijack_return_addr() actually needs to inspect the non-zero error code. TODO: adjust_ret_addr() should die. We can always calculate the value we need to write into *regs->sp, just UPROBE_FIX_CALL should record insn->length. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Teach arch_uprobe_post_xol() to restart if possibleOleg Nesterov
SIGILL after the failed arch_uprobe_post_xol() should only be used as a last resort, we should try to restart the probed insn if possible. Currently only adjust_ret_addr() can fail, and this can only happen if another thread unmapped our stack after we executed "call" out-of-line. Most probably the application if buggy, but even in this case it can have a handler for SIGSEGV/etc. And in theory it can be even correct and do something non-trivial with its memory. Of course we can't restart unconditionally, so arch_uprobe_post_xol() does this only if ->post_xol() returns -ERESTART even if currently this is the only possible error. default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim pointed out it should not forget to pop off the return address pushed by this insn executed out-of-line. Note: this is not "perfect", we do not want the extra handler_chain() after restart, but I think this is the best solution we can realistically do without too much uglifications. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Send SIGILL if arch_uprobe_post_xol() failsOleg Nesterov
Currently the error from arch_uprobe_post_xol() is silently ignored. This doesn't look good and this can lead to the hard-to-debug problems. 1. Change handle_singlestep() to loudly complain and send SIGILL. Note: this only affects x86, ppc/arm can't fail. 2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and avoid TF games if it is going to return an error. This can help to to analyze the problem, if nothing else we should not report ->ip = xol_slot in the core-file. Note: this means that handle_riprel_post_xol() can be called twice, but this is fine because it is idempotent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Conditionalize the usage of handle_riprel_insn()Oleg Nesterov
arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start, but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic. Move the callsite into "default" case and change the "0xff" case to fall-through. We are going to add the various hooks to handle the rip-relative jmp/call instructions (and more), we need this change to enforce the fact that the new code can not conflict with is_riprel_insn() logic which, after this change, can only be used by default_xol_ops. Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol() directly. This is fine unless another _xol_ops we may add later will need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup. In this case we can add uprobe_xol_ops->abort() hook, which (perhaps) we will need anyway in the long term. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->opsOleg Nesterov
Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops", move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default set of methods and change arch_uprobe_pre/post_xol() accordingly. This way we can add the new uprobe_xol_ops's to handle the insns which need the special processing (rip-relative jmp/call at least). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooksOleg Nesterov
No functional changes. Preparation to simplify the review of the next change. Just reorder the code in arch_uprobe_pre/post_xol() functions so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end. Also change arch_uprobe_pre_xol() to use utask instead of autask, to make the code more symmetrical with arch_uprobe_post_xol(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17uprobes/x86: Gather "riprel" functions togetherOleg Nesterov
Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to the closely related handle_riprel_insn(). This way it is simpler to read and understand this code, and this lessens the number of ifdef's. While at it, update the comment in handle_riprel_post_xol() as Jim suggested. TODO: rename them somehow to make the naming consistent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove ↵Oleg Nesterov
"mm" arg Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if it is true insn_rip_relative() must return false. validate_insn_bits() passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative() checks insn->x86_64. Also, remove the no longer needed "struct mm_struct *mm" argument and the unnecessary "return" at the end. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()Oleg Nesterov
No functional changes, preparation. Shift the code from prepare_fixups() to arch_uprobe_analyze_insn() with the following modifications: - Do not call insn_get_opcode() again, it was already called by validate_insn_bits(). - Move "case 0xea" up. This way "case 0xff" can fall through to default case. - change "case 0xff" to use the nested "switch (MODRM_REG)", this way the code looks a bit simpler. - Make the comments look consistent. While at it, kill the initialization of rip_rela_target_address and ->fixups, we can rely on kzalloc(). We will add the new members into arch_uprobe, it would be better to assume that everything is zero by default. TODO: cleanup/fix the mess in validate_insn_bits() paths: - validate_insn_64bits() and validate_insn_32bits() should be unified. - "ifdef" is not used consistently; if good_insns_64 depends on CONFIG_X86_64, then probably good_insns_32 should depend on CONFIG_X86_32/EMULATION - the usage of mm->context.ia32_compat looks wrong if the task is TIF_X32. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-04-13uretprobes/x86: Hijack return addressAnton Arapov
Hijack the return address and replace it with a trampoline address. Signed-off-by: Anton Arapov <anton@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-02-08uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain()Oleg Nesterov
Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is what consumer->handler() needs but uprobe_get_swbp_addr() is not exported. This also simplifies the code and makes it more consistent across the supported architectures. handle_swbp() becomes the only caller of uprobe_get_swbp_addr(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
2013-02-08uprobes/x86: Change __skip_sstep() to actually skip the whole insnOleg Nesterov
__skip_sstep() doesn't update regs->ip. Currently this is correct but only "by accident" and it doesn't skip the whole insn. Change it to advance ->ip by the length of the detected 0x66*0x90 sequence. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-11-03uprobes/x86: Cleanup the single-stepping codeOleg Nesterov
No functional changes. Now that default arch_uprobe_enable/disable_step() helpers do nothing, x86 has no reason to reimplement them. Change arch_uprobe_*_xol() hooks to do the necessary work and remove the x86-specific hooks. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-10-07uprobes/x86: Only rep+nop can be emulated correctlyOleg Nesterov
__skip_sstep() correctly detects the "nontrivial" nop insns, but since it doesn't update regs->ip we can not really skip "0x0f 0x1f | 0x0f 0x19 | 0x87 0xc0", the probed application is killed by SIGILL'ed handle_swbp(). Remove these additional checks. If we want to implement this correctly we need to know the full insn length to update ->ip. rep* + nop is fine even without updating ->ip. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes/x86: Fix arch_uprobe_disable_step() && UTASK_SSTEP_TRAPPED interactionOleg Nesterov
arch_uprobe_disable_step() should also take UTASK_SSTEP_TRAPPED into account. In this case the probed insn was not executed, we need to clear X86_EFLAGS_TF if it was set by us and that is all. Again, this code will look more clean when we move it into arch_uprobe_post_xol() and arch_uprobe_abort_xol(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes/x86: Xol should send SIGTRAP if X86_EFLAGS_TF was setOleg Nesterov
arch_uprobe_disable_step() correctly preserves X86_EFLAGS_TF and returns to user-mode. But this means the application gets SIGTRAP only after the next insn. This means that UPROBE_CLEAR_TF logic is not really right. _enable should only record the state of X86_EFLAGS_TF, and _disable should check it separately from UPROBE_FIX_SETF. Remove arch_uprobe_task->restore_flags, add ->saved_tf instead, and change enable/disable accordingly. This assumes that the probed insn was not trapped, see the next patch. arch_uprobe_skip_sstep() logic has the same problem, change it to check X86_EFLAGS_TF and send SIGTRAP as well. We will cleanup this all after we fold enable/disable_step into pre/post_hol hooks. Note: send_sig(SIGTRAP) is not actually right, we need send_sigtrap(). But this needs more changes, handle_swbp() does the same and this is equally wrong. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes/x86: Do not (ab)use TIF_SINGLESTEP/user_*_single_step() for ↵Oleg Nesterov
single-stepping user_enable/disable_single_step() was designed for ptrace, it assumes a single user and does unnecessary and wrong things for uprobes. For example: - arch_uprobe_enable_step() can't trust TIF_SINGLESTEP, an application itself can set X86_EFLAGS_TF which must be preserved after arch_uprobe_disable_step(). - we do not want to set TIF_SINGLESTEP/TIF_FORCED_TF in arch_uprobe_enable_step(), this only makes sense for ptrace. - otoh we leak TIF_SINGLESTEP if arch_uprobe_disable_step() doesn't do user_disable_single_step(), the application will be killed after the next syscall. - arch_uprobe_enable_step() does access_process_vm() we do not need/want. Change arch_uprobe_enable/disable_step() to set/clear X86_EFLAGS_TF directly, this is much simpler and more correct. However, we need to clear TIF_BLOCKSTEP/DEBUGCTLMSR_BTF before executing the probed insn, add set_task_blockstep(false). Note: with or without this patch, there is another (hopefully minor) problem. A probed "pushf" insn can see the wrong X86_EFLAGS_TF set by uprobes. Perhaps we should change _disable to update the stack, or teach arch_uprobe_skip_sstep() to emulate this insn. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes/x86: Implement x86 specific arch_uprobe_*_stepSebastian Andrzej Siewior
The arch specific implementation behaves like user_enable_single_step() except that it does not disable single stepping if it was already enabled by ptrace. This allows the debugger to single step over an uprobe. The state of block stepping is not restored. It makes only sense together with TF and if that was enabled then the debugger is notified. Note: this is still not correct. For example, TIF_SINGLESTEP check is not right, the application itself can set X86_EFLAGS_TF. And otoh we leak TIF_SINGLESTEP (set by enable) if the probed insn is "popf". See the next patches, we need the changes in arch/x86/kernel/step.c first. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>