There's no doubt about it, C (and C++) is a really powerful programming language, one that allows the developer to straddle both high-level layered abstractions (after all, object-oriented languages such as Java and Python are written in C) as well as to work upon the bare metal, as it were. This is fantastic. Of course, there's a price to pay: the compiler will do only so much. You want to overflow a memory buffer? Go ahead, it doesn't care. Want to peek at or poke an unmapped memory region? No problem.
Well, no problem for the compiler, but big problems for us! This is nothing new really. We mentioned just this in Chapter 2, Approaches to Kernel Debugging. C being a procedural and non-managed programming language (in memory terms), it's ultimately the programmer's responsibility to ensure that runtime memory usage is correct and well behaved.
The Linux kernel is almost entirely written in C (over 98% of the code is in C, as of the time of this writing). You see the potential for problems, right? (In fact, there's a slowly growing effort to begin porting the kernel, or portions of it, to a more memory-safe language such as Rust. See the Further reading section for links on this). In a similar vein, compilers are getting smarter. The Clang/Low Level Virtual Machine (LLVM) compiler – with which you can certainly build the kernel and modules – seems superior to the well-known GNU Compiler Collection or GCC compiler in terms of intelligent code generation, avoiding Out Of Bounds (OOB) accesses, and more. We cover some introductory material on using Clang as well here, though the focus is on the most commonly used GCC compiler. Here, we'll attempt to tackle this all-too-common and stubborn bug source – memory issues! The goal, after all, is to make your code memory safe.
Due to the vast scope of material to be covered on kernel memory debugging, we've split the discussion into two chapters, this one and the next.
In this chapter, we shall focus on and cover the following main topics (look out for detailed coverage of the kernel's SLUB debug framework and catching memory leakage in the next one):
The technical requirements and workspace remain identical to what's described in Chapter 1, A General Introduction to Debugging Software. The code examples can be found within the book's GitHub repository here: https://github.com/PacktPublishing/Linux-Kernel-Debugging. The only thing new in terms of software installation is the usage of the powerful Clang compiler. We cover the details in the Building your kernel and modules with Clang section.
The introductory paragraphs at the start of this chapter informed you of the annoying fact that though programming in C is like having a superpower (at least for your typical OS/driver/embedded domains), it's a double-edged sword: we humans inadvertently create defects and bugs. Memory bugs, especially, are simply all too common.
In fact, in Chapter 2, Approaches to Kernel Debugging, in the Types of bugs – the memory view section, we mentioned that among the different ways of classifying bug types is the memory view. For easy recollection – and to stress its importance here – I reproduce the short list of common memory corruption bug types:
These (except the last) are among the well-understood Undefined Behavior (UB) issues that a process, or even the OS, can blunder into. In this chapter, you'll learn about these issues – with the emphasis being within the kernel/driver code – and, more importantly, how to use various tools and approaches to catch them.
More precisely, within this chapter, we shall focus on the first two: incorrect memory accesses – which include all kinds of common memory bugs: UMR, OOB, UAF/UAR, and double-free. In the following chapter, we'll focus on catching memory defects in slab memory via the SLUB debug framework as well as detecting memory leaks. We'll cover data races and their complexities in Chapter 8, Lock Debugging, (as they are most commonly caused by incorrectly working with locks), and (internal) fragmentation, or wastage, will be mentioned in the next chapter, in the Learning to use the slabinfo and related utilities section.
It's Not Only about Bugs but Also about Security
Human error and C (and C++) create an unfortunate mix at times – bugs! But – and here's a key point – security issues very often tend to be bugs or defects at heart. This is why getting it right in the first place, and/or later hunting down and fixing bugs, is so critical to today's modern production systems and, indeed, the cloud (a huge portion of which is powered via the Linux kernel and its built-in hypervisor component – Kernel Virtual Machine (KVM)). Hackers currently have a pretty wide choice of OS-level exploits to choose from; this is especially true for older kernels. To see what I mean, take a peek here: https://github.com/xairy/linux-kernel-exploitation.
If nothing else, remember: unless you're running the latest stable kernel (which will have the latest bugfix and security patches), and have configured it with security in mind as well, you're asking for trouble! Again, see (much) more on Linux kernel security via a link in the Further reading section.
The goal is to have your project or product achieve memory safety.
Let's get to the important thing: what tools and/or approaches are available to you when debugging kernel memory issues? Several exist; among them are the following:
The first bullet point above – the one using which you can more or less directly catch kernel memory defects – is of course what we shall primarily focus on here. Subsequent chapters in this book will cover the indirect techniques mentioned in the second bullet point. So, patience – you'll get there. Also, as implied by the indirect wording, these may or may not help you catch memory bugs.
Okay. I'll attempt to summarize this information with specifics on the tools you can use in the following table. More detailed tables will be presented later in this chapter.
A few notes to match the numbers in square brackets in the second column:
All right! Now you know – in theory – how to catch memory bugs in the kernel or your driver, but in practice? Well, that requires you to learn to use the tool(s) mentioned above and practice! As mentioned already, understanding, configuring, and learning to leverage KASAN and UBSAN (along with using Clang) is the focus of this chapter (SLUB debug and kmemleak will be that of the next one). So, let's get on with it.
The Kernel Address Sanitizer (KASAN) is a port of the Address Sanitizer (ASAN) tooling of the Linux kernel. The ASAN project proved to be so useful in detecting memory-related defects that having similar abilities within the kernel was a no-brainer. ASAN is one of the few tools that could detect the buffer overread defect that was at the root of the (in)famous so-called Heartbleed exploit! See the Further reading section for a very interesting XKCD comic link that superbly illustrates the bug at the heart of Heartbleed.
A few points on KASAN will help you understand more:
Well, in terms of overhead from KASAN, it's really the RAM (more than CPU) overheads that can hurt. It does all depend on where you're coming from. For an enterprise-class server system, using several megabytes of RAM as overhead for KASAN can be considered tolerable. This is likely not the case for a resource-constrained embedded system (your typical Android smartphone, TV, wearable devices, low-end routers, and similar products being good examples). For this key reason, the modern Linux kernel supports three types, or modes, of KASAN implementations:
The first is the default and the one to use when actively debugging (or bug hunting). It has the largest relative overhead among the three, but is very effective at bug catching! The software tag-based approach has significantly lower overhead; it's appropriate for testing actual workloads. The third hardware tag-based version has the lowest overhead and is even suitable for production use!
Memory Checking on User-Mode Apps
The ASAN tooling was in fact first implemented (by Google engineers) as a GCC (and soon, Clang) patch for userspace applications. The suite includes ASAN, Leak Sanitizer (LSAN), Memory Sanitizer (MSAN), Thread Sanitizer (TSAN), and Undefined Behavior Sanitizer (UBSAN). They – especially ASAN – are really powerful and are simply a must-use for userspace app memory checking! My earlier book Hands-On System Programming with Linux does cover using ASAN (and Valgrind) in some detail.
In the discussion that follows, I assume that the Generic KASAN mode is being employed, primarily for the purpose of (memory) debugging. Actually, as you'll see in the following section, this is a bit of a moot point as the other tag-based modes are currently only supported on ARM64.
Firstly, as KASAN (as well as UBSAN) are compiler-based technologies, which compiler should you use? Both GCC and Clang are supported. You will require a relatively recent version of the compiler to be able to leverage KASAN – as of this writing, you'll need the following:
The following table neatly summarizes some key information about KASAN:
The Kernel and Compilers
Traditionally, the Linux kernel has been very tightly coupled to the GCC compiler; that's slowly changing. Clang is now almost fully supported, and Rust is making an entry. In fact, FYI, Clang is typically used to compile Android Open Source Project (AOSP) kernels. We cover using Clang in the Building your kernel and modules with Clang section.
Next, hardware-wise, KASAN traditionally requires a 64-bit processor. Why? Recall that it uses a shadow memory region whose size is one-eighth of the kernel virtual address space. On an x86_64, the kernel VAS region is 128 TB (as is the user-mode Virtual Address Space (VAS) region). An eighth of this is significant – it's 16 terabytes. So, what platforms does KASAN actually work on? Quoting directly from the official kernel documentation: Currently, Generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390, and riscv architectures, and tag-based KASAN modes are supported only for arm64.
Did you notice? Even ARM – the ARM 32-bit processor – is supported! This is a recent thing, as of the 5.11 kernel. Not only that, as of this writing at least, the lower overhead tag-based KASAN type is supported only for ARM64. Did you pause to wonder, why ARM64? Clearly, it's due to the incredible popularity of Android. Many, if not most, Android devices are powered via an ARM64 core(s) within a System on Chip (SoC). Detecting memory defects on Android – both in userspace and within the kernel – is critical in today's information economy. Thus, tag-based KASAN modes work on this key platform!
In Table 5.2, I highlight Generic KASAN in bold as it's the one we're going to work with here.
Of course, you need to configure your kernel to support Generic KASAN mode. It's straightforward: enable it by setting CONFIG_KASAN=y. When performing the kernel config (via the usual method, the make menuconfig UI), you'll find the menu option here:
Kernel hacking | Memory Debugging | KASAN: runtime memory debugger
To make it a bit more interesting, let's configure the kernel for KASAN for ARM64:
make ARCH=arm64 menuconfig
The screenshot shows you how it looks (here, we've navigated to the KASAN sub-menu):
Keep the mode as Generic mode. The < Help > button will show you that this corresponds to the kernel config CONFIG_KASAN_GENERIC=y. In fact, this Help display reveals some interesting information:
This mode consumes about 1/8th of available memory at kernel start and introduces an overhead of ~x1.5 for the rest of the allocations. The performance slowdown is ~x3.
Also, here, only because it's the ARM64 architecture, does the kernel config option CONFIG_HAVE_ARCH_KASAN_SW_TAGS get initialized to y:
$ grep KASAN .config
CONFIG_KASAN_SHADOW_OFFSET=0xdfffffd000000000
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_SW_TAGS=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
[...]
In addition, you can see how the kernel configures the shadow memory region start offset via the value allotted to the kernel config CONFIG_KASAN_SHADOW_OFFSET (it's a kernel virtual address of course) and other configs.
KASAN – Effect on the Build
With CONFIG_KASAN=y, building the kernel source tree by passing the V=1 parameter will show the details: the GCC flags being passed, and more. Here's a snippet of what you'd typically see, focused on the GCC flags passed during the build due to KASAN being enabled:
make V=1
gcc -Wp,-MMD,[...] -fsanitize=kernel-address -fasan-shadow-offset=0xdffffc0000000000 --param asan-globals=1 --param asan-instrumentation-with-call-threshold=0 --param asan-stack=1 --param asan-instrument-allocas=1 [...]
KASAN works essentially by being able to check every single memory access; it does this by using a technique called Compile Time Instrumentation (CTI). Put very simplistically, the compiler inserts function calls (__asan_load*() and __asan_store*()) before every 1-, 2-, 4-, 8-, or 16-byte memory access. Thus, the runtime can figure out whether the access is valid or not (by checking the corresponding shadow memory bytes). Now, there are two broad ways the compiler can perform this instrumentation: outline and inline. Outline instrumentation has the compiler inserting actual function calls (as just mentioned); inline instrumentation achieves the same thing but in a time-optimized manner by directly inserting the code (and not having the overhead of a function call)!
You can set the kernel config option Instrumentation type to either CONFIG_KASAN_OUTLINE (the default) or CONFIG_KASAN_INLINE. It's the typical trade-off: the outline type, the default, will result in a smaller kernel image while the inline type will result in a larger image but is faster (by a factor of 1.1x to 2x).
Also, (especially for your debug kernel), it's worth enabling the kernel config CONFIG_STACKTRACE, so that you also obtain stack traces of the allocation and freeing of affected slab objects in the report when a bug is detected. Similarly, turning on CONFIG_PAGE_OWNER – here within the menu Kernel hacking | Memory Debugging | Track page owner – will get you the stack traces of the allocation and freeing of affected physical pages. It's off by default; you have to boot with the parameter page_owner=on.
Also, when configuring an x86_64 for KASAN, you'll find an additional kernel config regarding vmalloc memory corruption detection. The option shows up like this:
[*] Back mappings in vmalloc space with real shadow memory
This helps detect vmalloc-related memory corruption issues (at the cost of higher memory usage during runtime).
So much for the theory and KASAN kernel config. Do configure and (re)build your (debug) kernel and we're good to give it a spin!
I'll assume that by now you've configured, built, and booted into your (debug) kernel that's enabled with KASAN (as the previous section has described in detail). On my setup – an x86_64 Ubuntu 20.04 LTS guest VM – this has been done.
To test whether KASAN works, we'll need to execute code that has memory bugs (I can almost hear some of you old-timers say "Yeah? That shouldn't be too hard"). We can always write our own test cases but why reinvent the wheel? This is a good opportunity to look at a part of the kernel's test infrastructure! The following section shows you how we'll leverage the kernel's KUnit unit testing framework to run KASAN test cases.
Why take the trouble to write our own test cases to test KASAN when the community has already done the work for us? Ah, the beauty of open source.
The Linux kernel has by now evolved sufficiently to have many kinds of test infrastructure, including full-fledged test suites, built into it; testing various aspects of the kernel is now a matter of configuring the kernel appropriately and running the tests!
With regard to possible built-in test frameworks within the kernel, the two primary ones are the KUnit framework and the kselftest framework. FYI, the official kernel documentation, of course, has all the details. As a start, you can check this one: Kernel Testing Guide: https://www.kernel.org/doc/html/latest/dev-tools/testing-overview.html#kernel-testing-guide – it provides a rough overview of available testing frameworks and tooling (including dynamic analysis) within the kernel.
Again, FYI, there are several other related and useful frameworks: kernel fault injection, notifier error injection, the Linux Kernel Dump Test Module (LKDTM), and so on. You'll find them under the kernel config here: Kernel hacking | Kernel Testing and Coverage.
Again, we don't intend to delve into the details of how KUnit works here; the idea is to merely use KUnit to test KASAN as a practical example at this point. For details on using these test frameworks – it will probably prove useful! – do see the links within the Further reading section.
As a pragmatic thing to do, and to begin getting familiar with it, let's leverage the kernel's KUnit – Unit Testing for the Linux kernel – framework to execute KASAN test cases!
It's really very simple to do. First, ensure your debug kernel is configured to use KUnit: CONFIG_KUNIT=y (or CONFIG_KUNIT=m).
We intend to run KASAN test cases, thus, we must have the KASAN test module configured as well:
CONFIG_KASAN_KUNIT_TEST=m
The kernel's module code for the KASAN test cases we're going to run is here: lib/test_kasan.c. A quick peek will show you the various test cases (there are many of them – 38 as of this writing):
// lib/test_kasan.c
static struct kunit_suite kasan_kunit_test_suite = {
.name = "kasan",
.init = kasan_test_init,
.test_cases = kasan_kunit_test_cases,
.exit = kasan_test_exit,
};
kunit_test_suite(kasan_kunit_test_suite);
This sets up the suite of test cases to execute. The actual test cases are in the kunit_suite structure's member named test_cases. It's a pointer to an array of kunit_case structures:
static struct kunit_case kasan_kunit_test_cases[] = {
KUNIT_CASE(kmalloc_oob_right),
KUNIT_CASE(kmalloc_oob_left),
[...]
KUNIT_CASE(kmalloc_double_kzfree),
KUNIT_CASE(vmalloc_oob),
{}
};
The KUNIT_CASE() macro sets up the test case. To help understand how it works, here's the code for the first of the test cases:
// lib/test_kasan.c
static void kmalloc_oob_right(struct kunit *test)
{
char *ptr;
size_t size = 123;
ptr = kmalloc(size, GFP_KERNEL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
KUNIT_EXPECT_KASAN_FAIL(test, ptr[size + OOB_TAG_OFF] = 'x');
kfree(ptr);
}
Quite intuitively, the actual checking occurs within the KUNIT_ASSERT|EXPECT_*() macros seen above. The first macro asserts that the return from the kmalloc() API doesn't result in an error and isn't null. The second macro, KUNIT_EXPECT_KASAN_FAIL(), has the KUnit code expect failure – a negative test case. This is indeed what should be done here: we expect that writing beyond the right side of the buffer (a write overflow defect) should trigger KASAN to report a failure! I'll leave it to you to study the implementation of these macros if interested.
Furthermore, and quite interestingly, the name and exit members of the kunit_suite structure specify functions to execute before and after each test case is run, respectively. The module leverages this to ensure that the kernel sysctl kasan_multi_shot is temporarily enabled and to set panic_on_warn to 0 (else, only the first invalid memory access would trigger a report and a possible kernel panic!).
Finally, let's try it out:
$ uname –r
5.10.60-dbg01
$ sudo modprobe test_kasan
This will cause all test cases within the KASAN test module to execute! Looking up the kernel log (via journalctl –k or dmesg) will show you the detailed KASAN reports for each of the test cases. As they're voluminous, I show a sampling of the output. The very first test case – KUNIT_CASE(kmalloc_oob_right) – causes KASAN to generate this report (its output is truncated – see more of it below):
Notice the following in the preceding screenshot:
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0x159/0x260 [test_kasan]
Write of size 1 at addr ffff8880316a45fb by task kunit_try_catch/1206
CPU: 2 PID: 1206 Comm: kunit_try_catch Tainted: G O 5.10.60-dbg01 #6
The remainder of the output can be seen in the following screenshot:
As CONFIG_PAGE_OWNER=y (as we suggested in the Configuring the kernel for Generic KASAN mode section), the following output turns up as well. It gives you insight into where the faulty-accessed page(s) is located and its ownership:
In the preceding screenshot, you can see KASAN justifying itself. It shows the actual memory region where the defect occurred and even points out the precise byte where it did (via the > for the row and ^ for the column symbols)! As a side effect of this bug, the kernel now disables all lock debugging. Further, KUnit says that running this first test case went well: ok 1 - kmalloc_oob_right.
Interpreting this information is important. It helps you drill down to what actually triggered the bug. We do just this in the section that follows!
In Figure 5.4, you can see the KASAN shadow memory revealing the defect's cause. We print the key line – the one prefixed with a right arrow symbol >:
>ffff8880318ad980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03
^
These are the KASAN shadow memory bytes, each one represents 8 bytes of actual memory. The byte 03 is pointed at (by the symbol ^) telling us where the issue lies. What do the bytes 00, 03, and so on, mean? The details follow:
So, now you'll realize that the shadow byte 03 implies that the memory was partially accessible. The first 3 bytes (as here, N = 3) were legally accessible; the remaining 5 (8 – 3 = 5) bytes weren't. Let's take the trouble to verify this in detail. This is the line of code that triggers the bug, of course (it's here within the kernel code base):
// lib/test_kasan.c
static void kmalloc_oob_right(struct kunit *test)
[...]
size_t size = 123;
ptr = kmalloc(size, GFP_KERNEL);
[...]
KUNIT_EXPECT_KASAN_FAIL(test, ptr[size + OOB_TAG_OFF] = 'x');
Now, the variable size is set to the value 123 and OOB_TAG_OFF is 0 when CONFIG_KASAN_GENERIC is enabled. So, in effect, the (buggy) code is this:
ptr[123] = 'x';
Now, Generic KASAN's memory granule size is 8 bytes. So, among the 123 bytes allocated, the fifteenth memory granule is the one being written to (as 8 * 15 = 120). The diagram that follows clearly shows the memory buffer and how it's been overflowed:
Check it out: Towards the right end, byte positions 120, 121, and 122 are valid and legal to read/write, but our KUnit KASAN test case deliberately wrote to byte position 123 – 1 byte beyond the end of the slab buffer, a clear OOB write overflow violation, and KASAN caught it! Not only that, as Figure 5.4 and Figure 5.5 clearly show, the kernel is intelligent enough to show the shadow value of 03 here, implying that the first 3 bytes are valid but the remaining 5 aren't – that's precisely the case!
Further, the surrounding bytes are set to the value 0xfc (see Figure 5.4). What does it mean? It's clear from the header – it's a red zone within the kernel SLUB object:
// mm/kasan/kasan.h
#ifdef CONFIG_KASAN_GENERIC
#define KASAN_FREE_PAGE 0xFF /* page was freed */
#define KASAN_PAGE_REDZONE 0xFE /* redzone for kmalloc_large allocations */
#define KASAN_KMALLOC_REDZONE 0xFC /* redzone inside slub object */
#define KASAN_KMALLOC_FREE 0xFB /* object was freed (kmem_cache_free/kfree) */
#define KASAN_KMALLOC_FREETRACK 0xFA /* object was freed and has free track set */
Back to our interpretation of Figure 5.4: The next line (BUG: KASAN: [...]) is just to show you that this continues with the next test case... KASAN has now caught the second test case's bug (KUNIT_CASE(kmalloc_oob_left)). The kernel log contains the same info as for the first defect: the bug summary by KASAN, the output of dump_stack() – the stack(s) call frames, who performed the allocation, the page-ownership info, and the memory state around the buggy access. This continues all the way to the thirty-eighth test case – fantastic!
A quick check of the kernel log shows what we expect – how the kernel's KUnit KASAN test case module has caught all 38 test cases with memory defects:
As can be clearly seen from the preceding screenshot, all 38 test cases are reported as ok (passed).
Exercise
Do perform what we've just done – running the kernel's KUnit KASAN test cases – on your box. Note, from the kernel log, the various KASAN test cases and verify that all ran correctly.
By the way, notice this:
$ lsmod |egrep "kunit|kasan"
test_kasan 81920 0
kunit 49152 1 test_kasan
In my particular case, you can see from the lsmod output that KUnit has been configured as a kernel module.
You can learn how to write your own suite of KUnit test cases. Do see the Further reading section for more on using KUnit!
Did you notice, in spite of having run all the KASAN KUnit test cases, there appear to be a few remaining generic memory defects (as we identified both in Chapter 4, Debug via Instrumentation – Using Kprobes, as well as in the What's the problem with memory anyway? section of this chapter) that the KUnit test cases don't cover?
So, I wrote a kernel module to exercise these test cases (when running the Generic KASAN-enabled debug kernel of course), along with some more interesting ones. To test against KASAN, remember to boot via your custom debug kernel, one that (obviously) has CONFIG_KASAN=y.
Due to space constraints, I won't show the entire code of our test module here (do refer to it on the book's GitHub repo and read the comments therein – you'll find it under the ch5/kmembugs_test folder). To get a flavor of it, let's take a peek at one of the test cases and how it's invoked. Here's the code of the UAR test case:
// ch5/kmembugs_test/kmembugs_test.c
/* The UAR - Use After Return - testcase */
static void *uar(void)
{
volatile char name[NUM_ALLOC];
volatile int i;
pr_info("testcase 2: UAR: ");
for (i=0; i<NUM_ALLOC-1; i++)
name[i] = 'x';
name[i] = ' ';
return name;
}
The module is designed to be loaded up via a bash script named load_testmod and the test cases are run interactively (via a bash wrapper script named run_tests). The run_tests script (which you must run as root) displays a menu of available tests and asks you to select any one by typing in its assigned number. You can see a screenshot of the menu – and thus all the test cases you can try out – in Figure 5.8, in the section that follows.
The script then writes this number to our debugfs pseudofile here: /sys/kernel/debug/test_kmembugs/lkd_dbgfs_run_testcase. The debugfs write hook function then receives this data from userspace, validates it, and invokes the appropriate test case routine (via a rather long if-else ladder). This design allows you to test interactively and execute any test case(s) as many times as you wish to.
Here's a code snippet showing how our debugfs module code invokes the preceding uar() test case:
// ch5/kmembugs_test/debugfs_kmembugs.c
static ssize_t dbgfs_run_testcase(struct file *filp, const char __user *ubuf, size_t count, loff_t *fpos)
{
char udata[MAXUPASS];
volatile char *res1 = NULL, *res2 = NULL;
[...]
if (copy_from_user(udata, ubuf, count))
return -EIO;
udata[count-1]=' ';
pr_debug("testcase to run: %s ", udata);
/* Now udata contains the data passed from userspace - the testcase # to run (as a string) */
if (!strncmp(udata, "1", 2))
umr();
else if (!strncmp(udata, "2", 2)) {
res1 = uar();
pr_info("testcase 2: UAR: res1 = "%s" ",
res1 == NULL ? "<whoops, it's NULL; UAR!>" : (char *)res1);
} else if (!strncmp(udata, "3.1", 4))
...
Clearly, this – test case #2 – is a defect, a bug. You know that local variables are valid only for their lifetime – while the function's executing. This, of course, is because local (or automatic) variables are allocated on the (kernel mode) stack frame of the process context in execution. Thus, you must stop referencing a local variable once outside the scope of its containing function. We (deliberately) don't! We attempt to fetch it as a return. The trouble is, by that time, it's gone...
Right, before diving into running the test cases (though there's no reason you can't run them right now), we divert into an interesting dilemma: how a known bug (like our UAR one) can at times appear to work perfectly fine.
The amazing (or crazy) thing about bugs like this one – the UAR defect – is that the code will sometimes seem to work! How come? It's like this: the memory holding the content of the local (automatic) variable is on the stack. Now, though we colloquially say that the stack frames are allocated on function entry and destroyed on function return (the so-called function prologue and epilogue), the reality isn't quite so dramatic.
The reality is that memory is typically allocated at page-level granularity. This includes the memory for stack pages. Thus, once a page of memory for the stack is allocated, there's usually enough for several frames (this, of course, depends on the circumstances). Then, when more memory for the stack is needed, it's grown (by allocating more pages, downwards, as it's the stack). The system knows where the top of the stack is by having the Stack Pointer (SP) register track this memory location. Also, you'll realize that the so-called "top of the stack" is typically the lowest legal address. Thus, when frames are allocated and/or a function is invoked, the SP register value reduces. When a function returns, the stack shrinks by adding to the SP register (remember, it's a downward-growing stack!). The following diagram is a representation of a typical kernel-mode stack on a (32-bit) Linux system:
So, it could well happen at some point that stale stack frames (and the corresponding data within them) exist underneath valid frames and could possibly be successfully referenced – without the system throwing a memory fault! – even later.
Carefully study Figure 5.7. As an example, we've shown the kernel-mode stack on a 32-bit Linux system, thus the size of the kernel stack will be 2 pages, typically 8 KB. Now, let's say the process context in execution (within the kernel) invoked these functions in this order (this is the call chain, shown as the circled steps 1 to 4 in the figure):
foo1() bar1() --> foo2() --> bar2()
Imagine we're at the leaf node, function bar2() in this example. It returns (circled step 5 above). This causes the SP register to get incremented back to the address of the call frame representing the function foo2(). So, though it remains intact on the stack, the stack memory of the call frame for function bar2() is now actually invalid! But an incorrect (read buggy) access to it might still succeed.
This should ideally not happen, but hey, it's an imperfect world, right?! The moral here: we require tools – and clear thinking is the best one – to catch tricky bugs such as UAS ones!
Right, back to our test cases! To run the tests, follow these steps:
cd <book_src>/ch5/kmembugs_test
./load_testmod
[...]
This should have the kernel module built and loaded into memory with dmesg showing that the debugfs pseudofile here – <debugfs_mountpt>/test_kmembugs/lkd_dbgfs_run_testcase – has been created.
sudo ./run_tests
Following is a screenshot showing that our test_kmembugs module is indeed loaded up (this was done via our load_testmod script), the menu shown via our run_tests script, and our running test case #2 – the UAR bug:
Here's an example screenshot of our test case framework catching the left OOB write buggy access via KASAN:
<...>/ch5/kmembugs_test/kmembugs_test.c:115:9: warning: function returns address of local variable [-Wreturn-local-addr]
115 | return (void *)name;
| ^~~~~~~~~~~~
This Is Important
It is your job as the programmer to carefully heed all compiler warnings and – as far as is humanely possible! – fix them.
Here's a sample run of the UAR test case:
$ sudo ./run_tests
[...]
(Type in the testcase number to run):
2
Running testcase""2" via test module now...
[ 144.313592] testcase to run: 2
[ 144.313597] test_kmembugs:uar(): testcase 2: UAR:
[ 144.313600] testcase 2: UAR: res1 = "<whoops,'it's NULL; UAR!>"
$
All right, we're now done with several test cases with KASAN enabled. What's KASAN's scorecard like? The following section shows you just this.
What memory corruption bugs (defects) does KASAN actually manage to, and not manage to, catch? From our test runs, we tabulated the results in the table that follows. Do study it carefully, along with the notes that go with it:
You'll find the explanations for the footnote notations seen in the table (such as [C1], [U1], and so on) below.
warning: '<var>' is used uninitialized in this function [-Wuninitialized]
warning: function returns address of local variable [-Wreturn-local-addr]
* In function 'check_copy_size',
inlined from 'copy_from_user' at ./include/linux/uaccess.h:191:6,
inlined from 'copy_user_test' at <...>/ch5/kmembugs_test/kmembugs_test.c:482:14:
./include/linux/thread_info.h:160:4: error: call to '__bad_copy_to' declared with attribute error: copy destination size is too small
160 | __bad_copy_to();
| ^~~~~~~~~~~~~~~
The following section delves into the details – don't miss out!
The footnote notations for KASAN ([K1], [K2], and so on) are explained in detail here. It's really important to read through all the notes, as we've mentioned certain caveats and corner cases as well:
global-out-of-bounds in <func>+0xstart/0xlen [modname]
Read/Write of size <n> at addr <addr> by task <taskname/PID>
The report will contain one of Read or Write depending upon whether a read or write buggy access occurred.
stack-out-of-bounds in <func>+0xstart/0xlen [modname]
Read/Write of size <n> at addr <addr> by task <taskname/PID>
BUG: KASAN: slab-out-of-bounds in <func>+0xstart/0xlen [modname]
Read/Write of size <n> at addr <addr> by task <taskname/PID>
BUG: KASAN: use-after-free in <func>+0xstart/0xlen [modname]
Read/Write of size <n> at addr <addr> by task <taskname/PID>
BUG: KASAN: double-free or invalid-free in <func>+0xstart/0xlen [modname]
Tip – the All-Results-in-One-Place Table
For your ready reference, in Part 2 of this key topic (the next chapter), in the Catching memory defects in the kernel – comparisons and notes (Part 2) section, Table 6.4 tabulates our test case results for our test runs with all the tooling technologies – vanilla/distro kernel, compiler warnings, with KASAN, with UBSAN, and with SLUB debug – we employ in this chapter. In effect, it's a compilation of all the findings in one place, thus allowing you to make quick (and hopefully helpful) comparisons.
Did you notice regarding the kernel's built-in KUnit-based test cases on KASAN that the test_kasan kernel module does not have test cases for these three memory defects – the UMR, UAR, and memory leaks. Why? Simple: KASAN does not catch these bugs! Okay, so now what can we conclude? Well, the KUnit (and other) test suites are often run in an automated fashion where the expected end result is that all viable test cases are passed; in fact, they must pass. This wouldn't have happened had they contained these three defects, so they don't. Now, don't read it wrong – this is simply the way the test suites are designed. There certainly exist other means besides KASAN by which these defects will be caught. Relax – we'll get there and catch them.
Here and now, we're showing that KASAN itself doesn't catch these particular nasty bugs. Later in the book, we'll see which tools do.
FYI, KASAN is a key component to catching difficult-to-find bugs via the fuzzing approach. Syzkaller (aka syzbot) – the de facto powerful Linux kernel fuzzer – requires KASAN to be configured in the kernel! We cover fuzzing briefly in Chapter 12, A few More Kernel Debug Approaches, in the What is fuzzing? section. Be sure to check it out.
Good going – you now know how to leverage the power of KASAN to help catch those tricky memory bugs! Let's now move on to using UBSAN.
One of the serious issues with a language such as C is that the compiler produces code for the correct case, but when the source code does something unexpected or just plain wrong, the compiler often does not understand what to do – it simply and blithely ignores such cases. This actually helps in the generation of highly optimized code at the cost of (possible security) bugs! Examples of this are common: overflowing/underflowing an array, arithmetic defects (such as dividing by zero or overflowing/underflowing a signed integer), and so on. Even worse, at times the buggy code seems to work (as we saw with accessing stale stack memory in the Stale frames – trouble in paradise section). Similarly, bad code might work in the presence of optimization, or not. Thus, cases such as these cannot be predicted and are called Undefined Behavior (UB).
The kernel's Undefined Behavior Sanitizer (UBSAN) catches several types of runtime UB. As with KASAN, it uses Compile Time Instrumentation (CTI) to do so. With UBSAN enabled fully, the kernel code is compiled with the –fsanitize=undefined option switch. The UB caught by UBSAN includes the following:
Some of these defects in fact overlap with what Generic KASAN catches as well. UBSAN instrumented code is certainly larger and slower (by a factor of 2 or 3 times). Still, it's very useful – especially during development and unit testing – to catch UB defects. In fact, enabling UBSAN on production systems is feasible if you can afford the larger kernel text size and processor overheads (on everything besides tiny embedded systems, you probably can).
Within the make menuconfig UI, you'll find the menu system for UBSAN at Kernel hacking | Generic Kernel Debugging Instruments | Undefined behaviour sanity checker.
A screenshot of the relevant menu is seen here:
To work with it, you should turn on the following kernel configs: CONFIG_UBSAN, CONFIG_UBSAN_BOUNDS (performs bound checking on array indices for static arrays – very useful!), CONFIG_UBSAN_MISC, and CONFIG_UBSAN_SANITIZE_ALL (you can look up the details for each here: lib/Kconfig.ubsan). Setting CONFIG_TEST_UBSAN=m has the lib/test_ubsan.c code built as a module.
UBSAN – Effect on the Build
With CONFIG_UBSAN=y, building the kernel source tree by passing the V=1 parameter will show the details, the GCC flags being passed, and more. Here's a snippet of what you see focused on the GCC flags passed during the build due to UBSAN being enabled:
make V=1
gcc -Wp,-MMD,[...] -fsanitize=bounds -fsanitize=shift -fsanitize=integer-divide-by-zero -fsanitize=unreachable -fsanitize=signed-integer-overflow -fsanitize=object-size -fsanitize=bool -fsanitize=enum [...]
Detecting UB on OOB (static) array accesses (and the like) is where UBSAN shines. Take, for example, our test case #4.4. We define a few static global arrays like this:
static char global_arr1[10], global_arr2[10], global_arr3[10];
Why Declare Three Global Arrays and Not Just One?
Well, as of this writing, there seems to be an issue with the way that the GCC compiler (at least as of version 9.3) sets up red zoning for global data. We observe that the red zone for the first global in a module may not have its left red zone correctly set up, causing the left OOB (underflow) buggy accesses to be missed as a side effect! So, by setting up three global arrays and passing the pointer to any but the first (we set up our test cases to pass the pointer to the second one), KASAN and UBSAN should be able to catch the buggy access! (Do note that the ordering of global variables within a module depends on the linker). This issue does not seem to occur with Clang 11+.
Interestingly, our efforts on this will eventually pay off: due to my reporting the issue – left OOB failing with GCC – as well as pointing out that the kernel's test_kasan module doesn't test for it, Marco Elver (the current KCSAN maintainer) has investigated this and added a patch to include this test case – add globals left-out-of-bounds test – to the test_kasan module (17 Nov 2021 – see here: https://lore.kernel.org/all/[email protected]/T/#u). Further, this book's very able technical reviewer, Chi-Thanh Hoang, has figured out that this is essentially due to GCC's lack of a left red zone (as mentioned above) and added this information to the kernel Bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=215051). The hope is that GCC maintainers will pick this up and suggest or implement a fix.
Below, we show one of our buggy test cases – the right OOB accesses on global memory – accessing one of these global arrays, incorrectly of course, for both read and write (I only show a portion of its code here). Note that the parameter p is a pointer to a piece of global memory within this module, typically the second one, global_arr2[]:
Here's its invocation via our debugfs hook:
[...] else if (!strncmp(udata, "4.4", 4))
global_mem_oob_left(WRITE, global_arr2);
Here's the (partial) code (note that the // style comments might spill over a line here; in the code they're fine):
int global_mem_oob_right(int mode, char *p)
{
volatile char w, x, y, z;
volatile char local_arr[20];
char *volatile ptr = p + ARRSZ + 3; // OOB right
[...]
} else if (mode == WRITE) {
*(volatile char *)ptr = 'x'; // invalid, OOB right write
p[ARRSZ - 3] = 'w'; // valid and within bounds
p[ARRSZ + 3] = 'x'; // invalid, OOB right write
local_arr[ARRAY_SIZE(local_arr) - 5] = 'y'; // valid and within bounds
local_arr[ARRAY_SIZE(local_arr) + 5] = 'z'; // invalid, OOB right write
} [...]
Once it detects a buggy access to memory (like the ones above), UBAN displays an error report like this to the kernel log:
array-index-out-of-bounds in <C-source-pathname.c>:<line#>
index <index> is out of range for type '<var-type> [<size>]'
Here's a screenshot showing just this. The right window shows the kernel log. For this case, ignore the top portion of the log – it's part of the error report from KASAN. The remainder – what we're interested in – is from UBSAN:
Here you can see how UBSAN has precisely caught the UB on line 194 – the attempt to write after the end legal index of the local (stack-based) array! Of course, it's entirely possible the line number you see here might change over time due to modifications to the code.
After this, test case # 4.3 intentionally, adventurously – and disastrously – now attempts a read underflow on a local stack memory variable. This too is cleanly caught by UBSAN! The following partial screenshot shows you the juicy bit:
Again, UBSAN even shows the source filename and line number where the buggy access was attempted!
It's more generic: UBSAN catches memory accesses when the variable in question indexes the static memory array incorrectly – when the index is out of bounds in any manner (left or right, underflow or overflow). It does appear, though, to miss buggy accesses made purely via pointers! KASAN has no issue with this and catches them all.
Just as we saw with KASAN (in the Remaining tests with our custom buggy kernel module section), UBSAN also cannot catch all memory defects. To prove this, we again run our custom buggy kernel module (in ch5/kmembugs_test), with pretty much identical results: even on a UBSAN-enabled kernel, these three bugs – the UMR, UAR, and memory leakage bugs – aren't caught! The following screenshot tells the story (to capture this, I (first) ran the run_tests script for the first three test cases with the --no-clear parameter, in order to preserve the kernel log content):
Also, don't forget: UBSAN is quite adept at catching arithmetic-related UB too – things such as overflowing or underflowing arithmetic calculations, the well-known Integer OverFlow (IoF) defect, and the divide-by-zero bugs being common and dangerous ones indeed! We mentioned the arithmetic UB that UBSAN can catch at the beginning of this section on UBSAN. We don't delve further into it as our topic is memory defects. To see more of UBSAN in action, you can always read the code of the UBSAN test module within the kernel (lib/test_ubsan.c) and try it out – I encourage you to do so. On a somewhat related note, understanding what unaligned memory access is, how it can cause issues, and how to avoid it is the topic of this kernel documentation page: Unaligned Memory Accesses: https://www.kernel.org/doc/html/latest/core-api/unaligned-memory-access.html#unaligned-memory-accesses.
Okay, let's tabulate the result of our experiments by running various test cases with UBSAN enabled within the kernel. Refer to the following table:
The following is with respect to the numeric footnotes in the preceding table:
The following section delves into the details. Don't miss out!
The footnote notations in the preceding table (such as [U1], [U2], and so on) are explained in detail here. It's important to read through all the notes, as we've mentioned certain caveats and corner cases as well:
array-index-out-of-bounds in <C-source-pathname.c>:<line#>
index <index> is out of range for type '<var-type> [<size>]'
object-size-mismatch in <C-source-pathname.c>:<line#>
store to address <addr> with insufficient space for an object of type '<var-type>'
In the preceding cases, UBSAN also reports the actual violation in some detail along with the process context and kernel-mode stack call trace.
Note though, that with KASAN turned off (I rebuilt a test debug kernel with CONFIG_KASAN=n) and UBSAN turned on, the semantics seem a bit different: in this case, I got a segfault only, with, of course, the kernel log clearly showing the source of the bug (by looking up what the instruction pointer register, here, RIP, was pointing to at the time of the fault).
Note
As mentioned earlier, don't forget to look up Table 6.4 in the following chapter, effectively, an all-results-in-one-place comparison table.
Great, now you're much better armed to catch memory bugs with both KASAN and UBSAN! I suggest you first take the time to absorb all this information, read the relevant detailed notes in the later Catching memory defects in the kernel – comparisons and notes (Part 1) section (pertaining to KASAN and UBSAN, at least for now), and practice trying out these test cases on your own. But wait: we saw that some OOB defects are only caught when compiled with Clang 11 or later. This is a key thing. So, let's now learn how to use the modern Clang compiler.
Low Level Virtual Machine (LLVM) is the original name given to this modular compiler tooling project. It now doesn't have much to do with traditional virtual machines and is instead a powerful backend for several compilers and toolchains.
Clang (the pronunciation rhymes with "slang") is a modern compiler frontend technology for C-type languages (includes support for C, C++, CUDA, Objective C/C++, and more) and is based on the LLVM compiler. It's considered a drop-in replacement for GCC. Clang currently seems to have a significant advantage over GCC – especially from our point of view – generating superior diagnostics as well as being able to intelligently generate code avoiding OOB accesses. This is critical. It paves the way to superior code. We saw (in the previous section on KASAN) that faulty left-OOB accesses on global memory, not reliably caught by GCC (versions 9.3, 10, and 11), are caught with Clang! The Android project is a key user of Clang, among many others.
Attempting to build your kernel module with Clang while the target kernel itself is compiled via GCC is simply not good enough! You'll have to use the same compiler for both – the underlying ABI needs to be completely consistent (this was one of the many things pointed out to me by Marco Elver when I was puzzled and asked why KASAN failed to catch certain test cases – again, the beauty of open source development). So, the upshot of it all is that we'll have to compile both our kernel and module with Clang 11.
Installing Clang and associated binaries in order to successfully compile your kernel module involves running the following command (on our Ubuntu 20.04 LTS guest):
sudo apt install clang-11 --install-suggests
Further, we seem to require setting up a soft link to llvm-objdump-11 named llvm-objdump (this is likely as I have both Clang 10 and Clang 11 installed simultaneously):
sudo ln -s /usr/bin/llvm-objdump-11 /usr/bin/llvm-objdump
Hang on, a simpler approach follows...
For the purpose of using Clang on the kernel and module builds, instead of installing Clang 11 (or later) on Ubuntu 20.04 LTS, it might just be simpler to install Ubuntu 21.10 (I've done so as an x86_64 VM) as it ships with Clang 13 preinstalled. I then built the very same 5.10.60 kernel as a debug kernel, applying a similar debug config as was discussed back in Chapter 1, A General Introduction to Debugging Software, but this time with Clang.
Importantly, to specify using Clang (and not GCC) as the compiler, when building the kernel, set the CC variable to it:
$ time make -j8 CC=clang
SYNC include/config/auto.conf.cmd
*
* Restart config...
* Memory initialization
*
The first time you run this command, the kbuild system detects that with the Clang compiler, certain add-ons now become available and viable to use (that couldn't be used with GCC) and prompts us to configure it:
Initialize kernel stack variables at function entry
> 1. no automatic initialization (weakest) (INIT_STACK_NONE)
2. 0xAA-init everything on the stack (strongest) (INIT_STACK_ALL_PATTERN) (NEW)
3. zero-init everything on the stack (strongest and safest) (INIT_STACK_ALL_ZERO) (NEW)
choice[1-3?]:
Though it would be very useful to take advantage of this auto-initialization of kernel stack variables, I deliberately left it at the default (option 1) in order to check our tooling to catch the UMR defect. Similarly, the build asked the following. Here, I kept the defaults by simply pressing the Enter key. You could change them if you wish to:
Enable heap memory zeroing on allocation by default (INIT_ON_ALLOC_DEFAULT_ON) [Y/n/?] y
Enable heap memory zeroing on free by default (INIT_ON_FREE_DEFAULT_ON) [Y/n/?] y
*
* KASAN: runtime memory debugger
*
KASAN: runtime memory debugger (KASAN) [Y/n/?] y
KASAN mode
> 1. Generic mode (KASAN_GENERIC)
choice[1]: 1
[...]
Back mappings in vmalloc space with real shadow memory (KASAN_VMALLOC) [Y/n/?] y
KUnit-compatible tests of KASAN bug detection capabilities (KASAN_KUNIT_TEST) [M/n/?] m
[...]
Once built, perform the usual remaining steps, not forgetting to add the CC=clang environment variable to the command line:
sudo make CC=clang modules_install && sudo make CC=clang install
When done, reboot and ensure you boot into your spanking new Clang-built debug kernel! Verify with the following:
$ cat /proc/version
Linux version 5.10.60-dbg02 (letsdebug@letsdebug-VirtualBox) (Ubuntu clang version 13.0.0-2, GNU ld (GNU Binutils for Ubuntu) 2.37) #4 SMP PREEMPT Wed ...
Now, let's move on to building our kernel module with Clang:
cd <book_src>/ch5/kmembugs_test
make CC=clang
That's it – I've conditionally embedded this setting of the CC variable into our load_testmod bash script, based on which compiler was used to build the current kernel. Also, FYI, to distinguish between our custom debug kernel built with Clang and GCC, the former's uname -r output shows up as seen here, 5.10.60-dbg02, whereas the latter's name shows up as 5.10.60-dbg02-gcc.
Exercise
I'll leave it as an exercise to you to build both a (debug) kernel as well as our test_kmembugs.ko kernel module with Clang and run the test cases.
With this, we complete the first part of our detailed coverage on understanding and catching memory defects within the kernel! Great going. Let's complete this chapter with a kind of summarization of the many tools and techniques we've used so far.
As we've already mentioned in this chapter, the following table tabulates our test case results for our test runs with all the tooling technologies/kernels – vanilla/distro kernel, compiler warnings, and with KASAN and UBSAN with our debug kernel – we employed in this chapter. In effect, it's a compilation of all our findings so far in one place, thus allowing you to make quick (and hopefully helpful) comparisons:
Of course, the explanations of the footnotes within this table (such as [C1], [K1], [U1], and so on) can be found in the earlier relevant section.
So, here's a very brief summary:
Regarding the preceding table, a few remaining notes now follow...
A few more points regarding Table 5.5:
A quick note on a KASAN alternative, especially for production systems, follows.
The Linux kernel has recent tooling named Kernel Electric-Fence (KFENCE). It's available from kernel version 5.12 onward (very recent, as of this writing).
KFENCE is described as a low-overhead sampling-based memory safety error detector of heap use-after-free, invalid-free, and out-of-bounds access errors.
It has recently added support for both x86 and ARM64 architectures with hooks to both the SLAB and SLUB memory allocators within the kernel. Why is KFENCE useful when we already have KASAN (which seems to overlap in function with it)? Here are a few points to help differentiate between them:
To enable KFENCE, set CONFIG_KFENCE=y (note, though, that as it's very recent, this config option isn't present in the 5.10 kernel series we work upon in this book). You can see more options and fine-tune them based on options present in the lib/Kconfig.kfence file.
We refer you to the details (including setup, tuning, interpreting error reports, internal implementation, and more) in the official kernel documentation page on KFENCE here: https://www.kernel.org/doc/html/latest/dev-tools/kfence.html#kernel-electric-fence-kfence.
A final point: with the 5.18 kernel (the latest stable one as of this writing), a new stricter memcpy() API family (covering the memcpy(), memmove() and memset() APIs), compile-time bounds checking kernel feature, has been introduced. It internally uses the compiler fortification feature (the kernel config is called CONFIG_FORTIFY_SOURCE). This being turned on helps catch a large class of typical buffer overflow defects within the kernel! Read more in LWN article here: Strict memcpy() bounds checking for the kernel: https://lwn.net/Articles/864521/.
With a non-managed programming language such as C, a trade-off exists: high power and the ability to code virtually anything you can imagine but at a significant cost. With memory being managed directly by the programmer, slipping in memory defects – bugs! – of all kinds, is rather easy to do, even for experienced folk.
In this chapter, we covered many tools, techniques, and approaches in this regard. First, you learned about the different (scary) types of memory defects. Then, we delved into how to use various tools and techniques to identify them and thus be able to fix them.
One of the most powerful tools in your arsenal for detecting memory bugs is KASAN. You learned how to configure and use it. We first learned how to use the kernel's built-in KUnit test framework to run memory test cases for KASAN to catch. We then developed our own custom module with test cases and even a neat way to test, via a debugfs pseudofile and custom scripts.
Catching UB with UBSAN came next. You learned how to configure it and leverage it to catch these kinds of defects, often overlooked, leading to not only buggy headaches but even security holes in production systems!
We learned that while GCC is solid and has been around for decades, a newer compiler, Clang, is in fact proving more adept at generating useful diagnostics (on our C code) and catching bugs that even GCC can miss! You saw how to use Clang to build the kernel and your modules, helping create more robust software, in effect.
As we covered these tools and frameworks, we tabulated the results, showing you the bugs a given tool can (or cannot) catch. To then summarize the whole thing, we built a larger table with columns covering all the test cases and all the tools – a quick and useful way for you to see and compare them (Table 5.5)! Note that we'll add to this table in the following chapter! Finally, we mentioned that the (very recent) KFENCE framework can (should) be used on production systems, in lieu of KASAN. The 5.18 kernel's CONFIG_FORTIFY_SOURCE config will likely be a big help as well.
So, congrats on completing this rather long – and really important – first chapter on catching memory bugs in kernel space! Do take the time to digest it and practice all you've learned. When set, I encourage you to move on to the next chapter where we'll complete our coverage on catching kernel memory defects.
With regard to the 5.18 mainline kernel: Strict memcpy() bounds checking for the kernel, Jon Corbet, July 2021: https://lwn.net/Articles/864521/