0%

Linux Mmap Use Analysis

前言

当agent涉及内核模块捕获事件时,具体事件如何吐给工作在用户态的agent事件捕获引擎并由其进行后续的事件处理?这个问题涉及到linux内核中一个非常经典的部分,即进程间通信;linux进程间通信的方式非常多,在入侵检测领域下这种内核态-用户态的通信需求场景中最常用到的通信手段包括但不限于如下三种:1、netlink 2、共享内存 3、ioctl;使用netlink最典型的例子是yulong,这块在之前的博客中有记录,yulong在driver层会将事件放到skb中通过netlink发送到用户态监听指定端口的进程,是目前效率最高且优雅的一种内核-用户通信方式;其次是mmap共享内存,常见的如agent-smith、sysdig等都是使用这种通信模式将内核事件类消息发送出去,是效率高,但缺少消息同步机制;最后是ioctl,这种通信方式比较老,文档齐全、编码简单,但是存在拷贝动作,不适合大批量数据的传输,且必须用户态程序先发出ioctl信号,常用于用户态对驱动模块的控制等;这篇文章记录下共享内存这块的学习过程和相关资料。

原理

mmap系统调用可将一个文件或其他对象映射进指定内存。当使用mmap进行内存映射后,就可以直接指定虚拟内存地址对指定文件/其他对象中的数据进行读写操作。之所以采用mmap进行通信,直接原因是效率高,如果使用copy_to_user这种内存拷贝函数,每次都会拷贝内存数据,且函数会检查用户空间指针的合法性,速度会降低;而mmap使用后会将一段物理地址映射到一段虚拟地址上,以后操作时不再检测地址合法性,也不用进行频繁的内存拷贝,效率更高。

mmap使用方式:

1
2
3
include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags,int fd, off_t offset);
int munmap(void *addr, size_t length);

相关参数含义如下,不再赘述:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
DESCRIPTION         top
mmap() creates a new mapping in the virtual address space of the
calling process. The starting address for the new mapping is
specified in addr. The length argument specifies the length of
the mapping (which must be greater than 0).

If addr is NULL, then the kernel chooses the (page-aligned)
address at which to create the mapping; this is the most portable
method of creating a new mapping. If addr is not NULL, then the
kernel takes it as a hint about where to place the mapping; on
Linux, the kernel will pick a nearby page boundary (but always
above or equal to the value specified by
/proc/sys/vm/mmap_min_addr) and attempt to create the mapping
there. If another mapping already exists there, the kernel picks
a new address that may or may not depend on the hint. The
address of the new mapping is returned as the result of the call.

The contents of a file mapping (as opposed to an anonymous
mapping; see MAP_ANONYMOUS below), are initialized using length
bytes starting at offset offset in the file (or other object)
referred to by the file descriptor fd. offset must be a multiple
of the page size as returned by sysconf(_SC_PAGE_SIZE).

After the mmap() call has returned, the file descriptor, fd, can
be closed immediately without invalidating the mapping.

The prot argument describes the desired memory protection of the
mapping (and must not conflict with the open mode of the file).
It is either PROT_NONE or the bitwise OR of one or more of the
following flags:

PROT_EXEC
Pages may be executed.

PROT_READ
Pages may be read.

PROT_WRITE
Pages may be written.

PROT_NONE
Pages may not be accessed.

The flags argument
The flags argument determines whether updates to the mapping are
visible to other processes mapping the same region, and whether
updates are carried through to the underlying file. This
behavior is determined by including exactly one of the following
values in flags:

MAP_SHARED
Share this mapping. Updates to the mapping are visible to
other processes mapping the same region, and (in the case
of file-backed mappings) are carried through to the
underlying file. (To precisely control when updates are
carried through to the underlying file requires the use of
msync(2).)

MAP_SHARED_VALIDATE (since Linux 4.15)
This flag provides the same behavior as MAP_SHARED except
that MAP_SHARED mappings ignore unknown flags in flags.
By contrast, when creating a mapping using
MAP_SHARED_VALIDATE, the kernel verifies all passed flags
are known and fails the mapping with the error EOPNOTSUPP
for unknown flags. This mapping type is also required to
be able to use some mapping flags (e.g., MAP_SYNC).

MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the
mapping are not visible to other processes mapping the
same file, and are not carried through to the underlying
file. It is unspecified whether changes made to the file
after the mmap() call are visible in the mapped region.

Both MAP_SHARED and MAP_PRIVATE are described in POSIX.1-2001 and
POSIX.1-2008. MAP_SHARED_VALIDATE is a Linux extension.

In addition, zero or more of the following values can be ORed in
flags:

MAP_32BIT (since Linux 2.4.20, 2.6)
Put the mapping into the first 2 Gigabytes of the process
address space. This flag is supported only on x86-64, for
64-bit programs. It was added to allow thread stacks to
be allocated somewhere in the first 2 GB of memory, so as
to improve context-switch performance on some early 64-bit
processors. Modern x86-64 processors no longer have this
performance problem, so use of this flag is not required
on those systems. The MAP_32BIT flag is ignored when
MAP_FIXED is set.

MAP_ANON
Synonym for MAP_ANONYMOUS; provided for compatibility with
other implementations.

MAP_ANONYMOUS
The mapping is not backed by any file; its contents are
initialized to zero. The fd argument is ignored; however,
some implementations require fd to be -1 if MAP_ANONYMOUS
(or MAP_ANON) is specified, and portable applications
should ensure this. The offset argument should be zero.
The use of MAP_ANONYMOUS in conjunction with MAP_SHARED is
supported on Linux only since kernel 2.4.

MAP_DENYWRITE
This flag is ignored. (Long ago—Linux 2.0 and earlier—it
signaled that attempts to write to the underlying file
should fail with ETXTBSY. But this was a source of
denial-of-service attacks.)

MAP_EXECUTABLE
This flag is ignored.

MAP_FILE
Compatibility flag. Ignored.

MAP_FIXED
Don't interpret addr as a hint: place the mapping at
exactly that address. addr must be suitably aligned: for
most architectures a multiple of the page size is
sufficient; however, some architectures may impose
additional restrictions. If the memory region specified
by addr and len overlaps pages of any existing mapping(s),
then the overlapped part of the existing mapping(s) will
be discarded. If the specified address cannot be used,
mmap() will fail.

Software that aspires to be portable should use the
MAP_FIXED flag with care, keeping in mind that the exact
layout of a process's memory mappings is allowed to change
significantly between kernel versions, C library versions,
and operating system releases. Carefully read the
discussion of this flag in NOTES!

MAP_FIXED_NOREPLACE (since Linux 4.17)
This flag provides behavior that is similar to MAP_FIXED
with respect to the addr enforcement, but differs in that
MAP_FIXED_NOREPLACE never clobbers a preexisting mapped
range. If the requested range would collide with an
existing mapping, then this call fails with the error
EEXIST. This flag can therefore be used as a way to
atomically (with respect to other threads) attempt to map
an address range: one thread will succeed; all others will
report failure.

Note that older kernels which do not recognize the
MAP_FIXED_NOREPLACE flag will typically (upon detecting a
collision with a preexisting mapping) fall back to a "non-
MAP_FIXED" type of behavior: they will return an address
that is different from the requested address. Therefore,
backward-compatible software should check the returned
address against the requested address.

MAP_GROWSDOWN
This flag is used for stacks. It indicates to the kernel
virtual memory system that the mapping should extend
downward in memory. The return address is one page lower
than the memory area that is actually created in the
process's virtual address space. Touching an address in
the "guard" page below the mapping will cause the mapping
to grow by a page. This growth can be repeated until the
mapping grows to within a page of the high end of the next
lower mapping, at which point touching the "guard" page
will result in a SIGSEGV signal.

MAP_HUGETLB (since Linux 2.6.32)
Allocate the mapping using "huge pages." See the Linux
kernel source file Documentation/admin-
guide/mm/hugetlbpage.rst for further information, as well
as NOTES, below.

MAP_HUGE_2MB, MAP_HUGE_1GB (since Linux 3.8)
Used in conjunction with MAP_HUGETLB to select alternative
hugetlb page sizes (respectively, 2 MB and 1 GB) on
systems that support multiple hugetlb page sizes.

More generally, the desired huge page size can be
configured by encoding the base-2 logarithm of the desired
page size in the six bits at the offset MAP_HUGE_SHIFT.
(A value of zero in this bit field provides the default
huge page size; the default huge page size can be
discovered via the Hugepagesize field exposed by
/proc/meminfo.) Thus, the above two constants are defined
as:

#define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT)
#define MAP_HUGE_1GB (30 << MAP_HUGE_SHIFT)

The range of huge page sizes that are supported by the
system can be discovered by listing the subdirectories in
/sys/kernel/mm/hugepages.

MAP_LOCKED (since Linux 2.5.37)
Mark the mapped region to be locked in the same way as
mlock(2). This implementation will try to populate
(prefault) the whole range but the mmap() call doesn't
fail with ENOMEM if this fails. Therefore major faults
might happen later on. So the semantic is not as strong
as mlock(2). One should use mmap() plus mlock(2) when
major faults are not acceptable after the initialization
of the mapping. The MAP_LOCKED flag is ignored in older
kernels.

MAP_NONBLOCK (since Linux 2.5.46)
This flag is meaningful only in conjunction with
MAP_POPULATE. Don't perform read-ahead: create page
tables entries only for pages that are already present in
RAM. Since Linux 2.6.23, this flag causes MAP_POPULATE to
do nothing. One day, the combination of MAP_POPULATE and
MAP_NONBLOCK may be reimplemented.

MAP_NORESERVE
Do not reserve swap space for this mapping. When swap
space is reserved, one has the guarantee that it is
possible to modify the mapping. When swap space is not
reserved one might get SIGSEGV upon a write if no physical
memory is available. See also the discussion of the file
/proc/sys/vm/overcommit_memory in proc(5). In kernels
before 2.6, this flag had effect only for private writable
mappings.

MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping. For a file
mapping, this causes read-ahead on the file. This will
help to reduce blocking on page faults later.
MAP_POPULATE is supported for private mappings only since
Linux 2.6.23.

MAP_STACK (since Linux 2.6.27)
Allocate the mapping at an address suitable for a process
or thread stack.

This flag is currently a no-op on Linux. However, by
employing this flag, applications can ensure that they
transparently obtain support if the flag is implemented in
the future. Thus, it is used in the glibc threading
implementation to allow for the fact that some
architectures may (later) require special treatment for
stack allocations. A further reason to employ this flag
is portability: MAP_STACK exists (and has an effect) on
some other systems (e.g., some of the BSDs).

MAP_SYNC (since Linux 4.15)
This flag is available only with the MAP_SHARED_VALIDATE
mapping type; mappings of type MAP_SHARED will silently
ignore this flag. This flag is supported only for files
supporting DAX (direct mapping of persistent memory). For
other files, creating a mapping with this flag results in
an EOPNOTSUPP error.

Shared file mappings with this flag provide the guarantee
that while some memory is mapped writable in the address
space of the process, it will be visible in the same file
at the same offset even after the system crashes or is
rebooted. In conjunction with the use of appropriate CPU
instructions, this provides users of such mappings with a
more efficient way of making data modifications
persistent.

MAP_UNINITIALIZED (since Linux 2.6.33)
Don't clear anonymous pages. This flag is intended to
improve performance on embedded devices. This flag is
honored only if the kernel was configured with the
CONFIG_MMAP_ALLOW_UNINITIALIZED option. Because of the
security implications, that option is normally enabled
only on embedded devices (i.e., devices where one has
complete control of the contents of user memory).

Of the above flags, only MAP_FIXED is specified in POSIX.1-2001
and POSIX.1-2008. However, most systems also support
MAP_ANONYMOUS (or its synonym MAP_ANON).

munmap()
The munmap() system call deletes the mappings for the specified
address range, and causes further references to addresses within
the range to generate invalid memory references. The region is
also automatically unmapped when the process is terminated. On
the other hand, closing the file descriptor does not unmap the
region.

The address addr must be a multiple of the page size (but length
need not be). All pages containing a part of the indicated range
are unmapped, and subsequent references to these pages will
generate SIGSEGV. It is not an error if the indicated range does
not contain any mapped pages.

remap_pfn_range与mmap作用类似,只不过它是映射内核内存到用户态,参数如下

image-20210314205503138

示例

流程:内核模块通过remap_pfn_range内核函数将创建的字符设备内存线性地址映射到用户态,用户态程序使用mmap将设备文件映射到进程虚拟地址空间,结果就相当于用户态进程可直接读取内核模块中捕获的事件信息;搞了个demo放到了github。

Kernel Module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/uaccess.h>
#include <linux/miscdevice.h>
#include <linux/delay.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/fs.h>
#include <linux/types.h>
#include <linux/delay.h>
#include <linux/moduleparam.h>
#include <linux/slab.h>
#include <linux/errno.h>
#include <linux/ioctl.h>
#include <linux/cdev.h>
#include <linux/string.h>
#include <linux/list.h>
#include <linux/pci.h>
#include <linux/gpio.h>

static char MMAP_TEST_STR[1024] = "reverse shell event{\n\t'evt':'rvshell',\n\t'pid':'19256',\n\t'exe':'/bin/bash',\n\t'cmdline':'bash',\n\t'cwd':'/root/Felicia',\n\t'ppid':'19255',\n\t'pexe':'/usr/bin/socat'\n}";

static struct cdev mmap_drv;
static dev_t ndev;
char *buffer;


static int mmap_drv_open(struct inode *nd, struct file *filp)
{
return 0;
}


static int mmap_drv_mmap(struct file *filp, struct vm_area_struct *vma)
{
// unsigned long page;
struct page *page = NULL;
unsigned char i;
unsigned long int len;
unsigned long start = (unsigned long)vma->vm_start;
//unsigned long end = (unsigned long)vma->vm_end;
unsigned long size = (unsigned long)(vma->vm_end - vma->vm_start);
// page = virt_to_phys(buffer);
page = virt_to_page((unsigned long) buffer + (vma->vm_pgoff << PAGE_SHIFT));
// if(remap_pfn_range(vma,start,page>>PAGE_SHIFT,size,vma->vm_page_prot))
if(remap_pfn_range(vma,start,page_to_pfn(page),size,vma->vm_page_prot))
return -1;
len = strlen(MMAP_TEST_STR);
for(i=0;i<len;i++)
buffer[i] = MMAP_TEST_STR[i];
return 0;
}

struct file_operations mmap_drv_ops = {
.owner = THIS_MODULE,
.open = mmap_drv_open,
.mmap = mmap_drv_mmap,
};


static int mmap_drv_init(void)
{
int ret;

cdev_init(&mmap_drv, &mmap_drv_ops);
ret = alloc_chrdev_region(&ndev, 0, 1, "evt_map");
if (ret < 0) {
return ret;
}
buffer = (unsigned char *)kmalloc(PAGE_SIZE,GFP_KERNEL);
printk(KERN_INFO "mmap_drv_init: major=%d minor=%d\n", MAJOR(ndev), MINOR(ndev));
ret = cdev_add(&mmap_drv, ndev, 1);
if (ret < 0) {
return ret;
}

return 0;
}

static void mmap_drv_exit(void)
{
cdev_del(&mmap_drv);
unregister_chrdev_region(ndev, 1);
}

module_init(mmap_drv_init);
module_exit(mmap_drv_exit);
MODULE_LICENSE("GPL");

User-space:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <unistd.h>  
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <linux/fb.h>
#include <sys/mman.h>
#include <errno.h>
#include <sys/ioctl.h>

#define PAGE_SIZE 4096

int main(void)
{
unsigned char *p_map;
int i;
int fd = open("/dev/evt_map", O_RDWR|O_SYNC);
if (fd < 0) {
printf("Fail to open %s. Error:%s\n", "/dev/evt_map", strerror(errno));
exit(-1);
}
p_map = (unsigned char *)mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
if(p_map == MAP_FAILED)
{
printf("mmap fail\n");
goto here;
}
for(i=0;i<256;i++)
printf("%c",p_map[i]);
printf("\n");

here:
munmap(p_map, PAGE_SIZE);
close(fd);
return 0;
}

Result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
root@eBPF:~/shared_mem_demo# make
make -C /lib/modules/4.15.0-20-generic/build M=/root/shared_mem_demo modules
make[1]: Entering directory '/usr/src/linux-headers-4.15.0-20-generic'
CC [M] /root/shared_mem_demo/kernel_module.o
/root/shared_mem_demo/kernel_module.o: warning: objtool: mmap_drv_mmap()+0x72: sibling call from callable instruction with modified stack frame
Building modules, stage 2.
MODPOST 1 modules
CC /root/shared_mem_demo/kernel_module.mod.o
LD [M] /root/shared_mem_demo/kernel_module.ko
make[1]: Leaving directory '/usr/src/linux-headers-4.15.0-20-generic'
root@eBPF:~/shared_mem_demo#
root@eBPF:~/shared_mem_demo#
root@eBPF:~/shared_mem_demo# insmod kernel_module.ko
root@eBPF:~/shared_mem_demo#
root@eBPF:~/shared_mem_demo# cat /proc/devices | grep evt_map
243 evt_map
root@eBPF:~/shared_mem_demo#
root@eBPF:~/shared_mem_demo# mknod /dev/evt_map c 243 0
root@eBPF:~/shared_mem_demo#
root@eBPF:~/shared_mem_demo# gcc userspace.c -o userspace
root@eBPF:~/shared_mem_demo#
root@eBPF:~/shared_mem_demo# ./userspace
reverse shell event{
'evt':'rvshell',
'pid':'19256',
'exe':'/bin/bash',
'cmdline':'bash',
'cwd':'/root/Felicia',
'ppid':'19255',
'pexe':'/usr/bin/socat'
}
root@eBPF:~/shared_mem_demo#

总结

没有总结,底层实现上的研究可以暂时放缓,记录完后继续去看应用层的k8s攻防与容器相关。郑瀚dalao在linux通信机制上的记载很详细也很齐全,后续有深入了解的需求可以继续参考他的博客。

参考链接

https://nieyong.github.io/wiki_cpu/mmap%E8%AF%A6%E8%A7%A3.html

https://www.daimajiaoliu.com/series/linux_kernel/4713be91c10041c

https://blog.csdn.net/wuheshi/article/details/52911465?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-6&spm=1001.2101.3001.4242

https://man7.org/linux/man-pages/man2/mmap.2.html

https://www.ibm.com/developerworks/cn/linux/l-kernel-memory-access/

https://www.cnblogs.com/LittleHann/p/3867214.html

https://zhuanlan.zhihu.com/p/160836803

https://www.cnblogs.com/java-ssl-xy/p/7868531.html