0%

Kprobe principle analysis

kprobes简介

1
2
3
KProbes is a debugging mechanism for the Linux kernel which can also be used for monitoring events inside a production system. You can use it to weed out performance bottlenecks, log specific events, trace problems etc. KProbes was developed by IBM as an underlying mechanism for another higher level tracing tool called DProbes. DProbes adds a number of features, including its own scripting language for the writing of probe handlers. However, only KProbes has been merged into the standard kernel.
...
The figure to the right describes the architecture of KProbes. On the x86, KProbes makes use of the exception handling mechanisms and modifies the standard breakpoint, debug and a few other exception handlers for its own purpose. Most of the handling of the probes is done in the context of the breakpoint and the debug exception handlers which make up the KProbes architecture dependent layer. The KProbes architecture independent layer is the KProbes manager which is used to register and unregister probes. Users provide probe handlers in kernel modules which register probes through the KProbes manager.

Kprobes是Linux内核的一种调试机制,可以用来在生产环境中监控内核事件,其允许使用者在内核指定位置注册自定义的回调函数,捕捉内核事件、对内核信息进行过滤、分析,达到内核观测的效果。kprobes这种内核追踪机制常用于性能、安全监控领域,最常见的,比如用于hids/cwpp/edr的agent;本质上,他和替换系统调用表这种比较暴力的开膛破腹式的监控手段不同(典型的如yulong),内核机制的支持使其相对比较稳定和优雅(虽然不如ebpf),如果probe点合适(字段不经常变更),且兼容性足够好,那么将kprobes用在agent模块上则会使agent获得性能、安全(绕过相对用户态较难)的双重优势,典型如开源项目agent-smith,目前agent已在字节大量部署,据说部署规模有10w+;不再赘述,记录下核心知识点的学习过程。

kprobes原理

Kprobes

Kporbes系列本质上包含了三种探测手段,即kprobe、kretprobe、jprobe,其分别有不同的应用场景,简单介绍如下:

1
2
3
4
5
kprobe:支持使用者在内核任意位置注册回调,但有部分注册点是受限的,如不允许其以本身为probe点进行注册等

kretprobe:支持使用者在probe点对应的内核函数返回时执行回调

jprobe:支持使用者在probe点对应的内核函数开始时执行回调

kprobe

执行流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
How Does a Kprobe Work?
-----------------------

When a kprobe is registered, Kprobes makes a copy of the probed
instruction and replaces the first byte(s) of the probed instruction
with a breakpoint instruction (e.g., int3 on i386 and x86_64).

When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
registers are saved, and control passes to Kprobes via the
notifier_call_chain mechanism. Kprobes executes the "pre_handler"
associated with the kprobe, passing the handler the addresses of the
kprobe struct and the saved registers.

Next, Kprobes single-steps its copy of the probed instruction.
(It would be simpler to single-step the actual instruction in place,
but then Kprobes would have to temporarily remove the breakpoint
instruction. This would open a small time window when another CPU
could sail right past the probepoint.)

After the instruction is single-stepped, Kprobes executes the
"post_handler," if any, that is associated with the kprobe.
Execution then continues with the instruction following the probepoint.
1
2
3
4
5
6
7
8
1、注册kprobe点时kprobe将备份被探测的指令并将被探测函数的头指令替换成int3,并在notifier_call_chain中注册关联到对应kprobe点的pre_handler异常处理函数
2、cpu执行到int3指令后触发中断,上下文信息被保存,调用中断处理函数
3、中断处理函数调用注册kprobe时放在notifier_call_chain中的pre_handler回调函数
4、pre_handler处理函数将kprobe struct结构体与保存的原探测点函数对应的寄存器作为参数传入并执行
5、pre_handler处理函数执行完后将单步执行之前保存的被探测函数头指令
6、单步指令执行完后,触发debug异常,kprobes继续执行post_handler
7、post_handler执行完后,原本的被探测函数指令才会继续被执行
8、当不再需要kprobe时候,原始的字节内容将会被复制回目标地址上,这样被探测函数就回到了其初始状态

Example

贴下linux内核samples目录测试用例源码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>

#define MAX_SYMBOL_LEN 64
static char symbol[MAX_SYMBOL_LEN] = "_do_fork";
module_param_string(symbol, symbol, sizeof(symbol), 0644);

/* For each probe you need to allocate a kprobe structure */
static struct kprobe kp = {
.symbol_name = symbol,
};

/* kprobe pre_handler: called just before the probed instruction is executed */
static int handler_pre(struct kprobe *p, struct pt_regs *regs)
{
#ifdef CONFIG_X86
pr_info("<%s> pre_handler: p->addr = 0x%p, ip = %lx, flags = 0x%lx\n",
p->symbol_name, p->addr, regs->ip, regs->flags);
#endif
#ifdef CONFIG_PPC
pr_info("<%s> pre_handler: p->addr = 0x%p, nip = 0x%lx, msr = 0x%lx\n",
p->symbol_name, p->addr, regs->nip, regs->msr);
#endif
#ifdef CONFIG_MIPS
pr_info("<%s> pre_handler: p->addr = 0x%p, epc = 0x%lx, status = 0x%lx\n",
p->symbol_name, p->addr, regs->cp0_epc, regs->cp0_status);
#endif
#ifdef CONFIG_TILEGX
pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx, ex1 = 0x%lx\n",
p->symbol_name, p->addr, regs->pc, regs->ex1);
#endif
#ifdef CONFIG_ARM64
pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx,"
" pstate = 0x%lx\n",
p->symbol_name, p->addr, (long)regs->pc, (long)regs->pstate);
#endif
#ifdef CONFIG_S390
pr_info("<%s> pre_handler: p->addr, 0x%p, ip = 0x%lx, flags = 0x%lx\n",
p->symbol_name, p->addr, regs->psw.addr, regs->flags);
#endif

/* A dump_stack() here will give a stack backtrace */
return 0;
}

/* kprobe post_handler: called after the probed instruction is executed */
static void handler_post(struct kprobe *p, struct pt_regs *regs,
unsigned long flags)
{
#ifdef CONFIG_X86
pr_info("<%s> post_handler: p->addr = 0x%p, flags = 0x%lx\n",
p->symbol_name, p->addr, regs->flags);
#endif
#ifdef CONFIG_PPC
pr_info("<%s> post_handler: p->addr = 0x%p, msr = 0x%lx\n",
p->symbol_name, p->addr, regs->msr);
#endif
#ifdef CONFIG_MIPS
pr_info("<%s> post_handler: p->addr = 0x%p, status = 0x%lx\n",
p->symbol_name, p->addr, regs->cp0_status);
#endif
#ifdef CONFIG_TILEGX
pr_info("<%s> post_handler: p->addr = 0x%p, ex1 = 0x%lx\n",
p->symbol_name, p->addr, regs->ex1);
#endif
#ifdef CONFIG_ARM64
pr_info("<%s> post_handler: p->addr = 0x%p, pstate = 0x%lx\n",
p->symbol_name, p->addr, (long)regs->pstate);
#endif
#ifdef CONFIG_S390
pr_info("<%s> pre_handler: p->addr, 0x%p, flags = 0x%lx\n",
p->symbol_name, p->addr, regs->flags);
#endif
}

/*
* fault_handler: this is called if an exception is generated for any
* instruction within the pre- or post-handler, or when Kprobes
* single-steps the probed instruction.
*/
static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
{
pr_info("fault_handler: p->addr = 0x%p, trap #%dn", p->addr, trapnr);
/* Return 0 because we don't handle the fault. */
return 0;
}

static int __init kprobe_init(void)
{
int ret;
kp.pre_handler = handler_pre;
kp.post_handler = handler_post;
kp.fault_handler = handler_fault;

ret = register_kprobe(&kp);
if (ret < 0) {
pr_err("register_kprobe failed, returned %d\n", ret);
return ret;
}
pr_info("Planted kprobe at %p\n", kp.addr);
return 0;
}

static void __exit kprobe_exit(void)
{
unregister_kprobe(&kp);
pr_info("kprobe at %p unregistered\n", kp.addr);
}

module_init(kprobe_init)
module_exit(kprobe_exit)
MODULE_LICENSE("GPL");

kretprobe

执行流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Kretprobe entry-handler
^^^^^^^^^^^^^^^^^^^^^^^

Kretprobes also provides an optional user-specified handler which runs
on function entry. This handler is specified by setting the entry_handler
field of the kretprobe struct. Whenever the kprobe placed by kretprobe at the
function entry is hit, the user-defined entry_handler, if any, is invoked.
If the entry_handler returns 0 (success) then a corresponding return handler
is guaranteed to be called upon function return. If the entry_handler
returns a non-zero error then Kprobes leaves the return address as is, and
the kretprobe has no further effect for that particular function instance.

Multiple entry and return handler invocations are matched using the unique
kretprobe_instance object associated with them. Additionally, a user
may also specify per return-instance private data to be part of each
kretprobe_instance object. This is especially useful when sharing private
data between corresponding user entry and return handlers. The size of each
private data object can be specified at kretprobe registration time by
setting the data_size field of the kretprobe struct. This data can be
accessed through the data field of each kretprobe_instance object.

In case probed function is entered but there is no kretprobe_instance
object available, then in addition to incrementing the nmissed count,
the user entry_handler invocation is also skipped.
1
2
3
4
5
1、对指定的函数入口进行kprobe插桩
2、当被探测函数入口被kprobe命中时,将被探测函数返回地址保存并替换为一个"蹦床"(trampoline)函数地址,trampoline地址为关联到kprobe点的handler处理函数
3、当被探测函数最终返回时(ret),cpu将会跳转到trampoline函数进行处理
4、处理完成后,指令指针寄存器将指向之前保存的被探测函数返回地址继续执行指令
5、当不再需要kretprobe时,函数入口的kprobe将会被移除

Example

贴下linux内核samples目录测试用例源码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
#include <linux/ktime.h>
#include <linux/limits.h>
#include <linux/sched.h>

static char func_name[NAME_MAX] = "_do_fork";
module_param_string(func, func_name, NAME_MAX, S_IRUGO);
MODULE_PARM_DESC(func, "Function to kretprobe; this module will report the"
" function's execution time");

/* per-instance private data */
struct my_data {
ktime_t entry_stamp;
};

/* Here we use the entry_hanlder to timestamp function entry */
static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
struct my_data *data;

if (!current->mm)
return 1; /* Skip kernel threads */

data = (struct my_data *)ri->data;
data->entry_stamp = ktime_get();
return 0;
}

/*
* Return-probe handler: Log the return value and duration. Duration may turn
* out to be zero consistently, depending upon the granularity of time
* accounting on the platform.
*/
static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
unsigned long retval = regs_return_value(regs);
struct my_data *data = (struct my_data *)ri->data;
s64 delta;
ktime_t now;

now = ktime_get();
delta = ktime_to_ns(ktime_sub(now, data->entry_stamp));
pr_info("%s returned %lu and took %lld ns to execute\n",
func_name, retval, (long long)delta);
return 0;
}

static struct kretprobe my_kretprobe = {
.handler = ret_handler,
.entry_handler = entry_handler,
.data_size = sizeof(struct my_data),
/* Probe up to 20 instances concurrently. */
.maxactive = 20,
};

static int __init kretprobe_init(void)
{
int ret;

my_kretprobe.kp.symbol_name = func_name;
ret = register_kretprobe(&my_kretprobe);
if (ret < 0) {
pr_err("register_kretprobe failed, returned %d\n", ret);
return -1;
}
pr_info("Planted return probe at %s: %p\n",
my_kretprobe.kp.symbol_name, my_kretprobe.kp.addr);
return 0;
}

static void __exit kretprobe_exit(void)
{
unregister_kretprobe(&my_kretprobe);
pr_info("kretprobe at %p unregistered\n", my_kretprobe.kp.addr);

/* nmissed > 0 suggests that maxactive was set too low. */
pr_info("Missed probing %d instances of %s\n",
my_kretprobe.nmissed, my_kretprobe.kp.symbol_name);
}

module_init(kretprobe_init)
module_exit(kretprobe_exit)
MODULE_LICENSE("GPL");

jprobe

执行流程

1
2
3
4
5
6
7
A JProbe has to transfer control to another function which has the same prototype as the function on which the probe was placed and then give back control to the original function with the same state as there was before the JProbe was executed. A JProbe leverages the mechanism used by a KProbe. Instead of calling a user-defined pre-handler a JProbe specifies its own pre-handler called setjmp_pre_handler() and uses another handler called a break_handler. This is a three-step process.

In the first step, when the breakpoint is hit control reaches kprobe_handler() which calls the JProbe pre-handler (setjmp_pre_handler()). This saves the stack contents and the registers before changing the eip to the address of the user-defined function. Then it returns 1 which tells kprobe_handler() to simply return instead of setting up single-stepping as for a KProbe. On return control reaches the user-defined function to access the arguments of the original function. When the user defined function is done it calls jprobe_return() instead of doing a normal return.

In the second step jprobe_return() truncates the current stack frame and generates a breakpoint which transfers control to kprobe_handler() through do_int3(). kprobe_handler() finds that the generated breakpoint address (address of int3 instruction in jprobe_handler()) does not have a registered probe however KProbes is active on the current CPU. It assumes that the breakpoint must have been generated by JProbes and hence calls the break_handler of the current_kprobe which it saved earlier. The break_handler restores the stack contents and the registers that were saved before transferring control to the user-defined function and returns.

In the third step kprobe_handler() then sets up single-stepping of the instruction at which the JProbe was set and the rest of the sequence is the same as that of a KProbe.
1
2
3
4
5
1、对指定函数入口进行kprobe插桩
2、当被探测函数入口被kprobe命中时,kprobe_handler函数将被执行,其内部将会调用jprobe预处理器(setjmp_pre_handler)保存被探测函数的上下文(栈、寄存器)信息,之后将指令指针寄存器修改为jprobe->entry函数地址(jprobe处理函数)
3、jprobe处理函数尾部调用jprobe_return恢复被探测函数上下文信息
4、返回被探测函数继续执行指令
5、当不再需要jprobe时,函数入口的kprobe将会被移除

Example

本质上感觉jprobe与kprobe就效果而言其实差不多,不知道是不是因为这个samples中没有给jprobe的用例,网上找了一个:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/*jprobe_test.c */
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/uio.h>
#include <linux/kprobes.h>
#include <linux/kallsyms.h>
/*
* Jumper probe for do_fork.
* Mirror principle enables access to arguments of the probed routine
* from the probe handler.
*/
/* Proxy routine having the same arguments as actual do_fork() routine */
long jdo_fork(
  unsigned long clone_flags,
  unsigned long stack_start,
  struct pt_regs *regs,
  unsigned long stack_size,
  int __user * parent_tidptr,
  int __user * child_tidptr) {
  printk("jprobe: clone_flags=0x%lx, stack_size=0x%lx, regs=0x%p\n",clone_flags, stack_size, regs);
  /* Always end with a call to jprobe_return(). */
  jprobe_return();
  /*NOTREACHED*/
  return 0;
}
static struct jprobe my_jprobe = {
.entry = (kprobe_opcode_t *) jdo_fork
};
int init_module(void) {
int ret;
my_jprobe.kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("do_fork");
if (!my_jprobe.kp.addr) {
printk("Couldn't find %s to plant jprobe\n", "do_fork");
return -1;
}
if ((ret = register_jprobe(&my_jprobe)) <0) {
printk("register_jprobe failed, returned %d\n", ret);
return -1;
}
printk("Planted jprobe at %p, handler addr %p\n",my_jprobe.kp.addr, my_jprobe.entry);
return 0;
}
void cleanup_module(void) {
unregister_jprobe(&my_jprobe);
printk("jprobe unregistered\n");
}
MODULE_LICENSE("GPL");

总结

参考agent-smith写了一个小玩具,用kprobe拿了一下内核sys_execve系统调用对应的进程启动事件,kprobe这块的接口都比较简单,不再赘述

参考链接

https://lwn.net/Articles/132196/

https://www.kernel.org/doc/Documentation/kprobes.txt

https://www.cnblogs.com/LittleHann/p/3854977.html

https://www.cnblogs.com/LittleHann/p/3920387.html

https://kernelgo.org/kprobe.html

https://blog.csdn.net/luckyapple1028/article/details/54350410

https://www.cnblogs.com/jzssuanfa/p/7373811.html

https://www.cnblogs.com/arnoldlu/p/9752061.html