Skip to main content

Static Analysis

· 9 min read

Introduction

Memory is a cruical topic when it comes to building a software. Programs running on historical devices referenced physical memory location which is superseded by a separate new component: Memory Management Unit (MMU). We now work with virtual memory addresses that is managed by MMU for us[0].

I recently stumbled across a video regarding the RustTM language. The intent of this post is not to look down on the RustTM language but to observe a behaviour of a trivial C program. A program fragment was shown in the video, similar to the one shown in Listing 1. After building the executable, we can notice that the program exits normally. But before that, we tried to access a region of memory that was "freed". In contrast, RustTM informs this issue during compilation.


#include <stdlib.h>

int
main (void)
{

int *x = malloc(sizeof(int));
free(x);
*x = 0xA455;

exit(0);
}
Listing 1: malloc(3) and free(3)

One would assume that the program under execution for Listing 1 (as seen in Listing 2) should have received the segmentation violation signal (SIGSEGV) as it made an attempt to access a freed memory region. This is more complex that it is described here. In essence, malloc-like memory allocation library functions does more than just allocating memory. Demystifying malloc is not the purpose for now. Interested readers can browse link [1] provided at the end. To request any operation from the system, we use system calls. To request more memory for a process, we use the mmap(2) system call.


Script started on Wed Aug 27 16:13:10 2025

bash-3.2$ ./listing1
bash-3.2$ echo $?
0
bash-3.2$ exit

Script done on Wed Aug 27 16:13:16 2025
Listing 2: Behavior of program from Listing 1
tip

The output for the program listing1 (and others) is captured using the script(1) utility. The output from this utility may contain control characters that are then removed using the col(1) command. The command is used as follows:

$ SHELL=/bin/bash script <output-file-name>
...

$ col -b < <input-file-name> > <output-file-name>

My default shell is zsh. By defaut, script(1) will use the environment variable SHELL as the shell process. My configuration of zsh contains coloring and other "special" characters that appear in the output file. Unfortunately, even the col (with the given flag) is not able to clean out the terminal output. For simplicity, I chose to show the output from the bash shell.

mmap(2) System Call

As the name suggests, mmap(2) is used to map a file described by a file descriptor into the memory. But it also allows anonymous mapping. It is more genral than malloc-like library functions. For instance, we can specify the protection of the memory region. By default, this system call assumes that the caller wishes to map a region of file into the memory. We need to explicitly state we intend to map anonymous memory and not a file. The return value from this system call defines the starting address of the mapped memory. This call is implementation defined and the address lies somewhere between the stack and heap of a process.

Listing 3 shows an program identical to the one shown in Listing 1. The first argument to mmap(2) takes an address that the kernel will use as a "hint" as to where the starting address of the mapped region will be placed. Unless the MAP_FIXED flag is used, any previous mapping done in the requested address is not replaced. If mmap(2) with MAP_FIXED flag is called and the first argument is an address that already contains a previous mapping, upon successful return, the previous mapping is replaced. The use of MAP_FIXED is discouraged if portability is a consideration.


#include <sys/mman.h>
#include <stdlib.h>

int
main (void)
{

int *x = mmap(NULL,
sizeof(int),
PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE,
/* ignored */ -1,
/* ignored */ 0);

munmap(x, sizeof(int));
*x = 0xA455;

exit(0);
}
Listing 3: mmap(2) and munmap(2)

When the program from Listing 3 is compiled and executed, we see the behavior as seen in Listing 4. The process terminates due to segmentation violation. When this program is executed inside a debugger, you'll notice that the signal is received when the memory address is dereferenced for assginment of a value.


Script started on Wed Aug 27 16:18:50 2025

bash-3.2$ ./listing3
Segmentation fault: 11
bash-3.2$ echo $?
139
bash-3.2$ exit

Script done on Wed Aug 27 16:19:01 2025
Listing 4: Behavior of program from Listing 3

Like mentioned earlier, malloc(3) does not simply allocate a memory and return the address of the allocated memory region. This function internally performs various memory management operation (as can be seen on musl's implementation.) Like we've seen in Listing 3, segmentation violation occurs if the received memory region from mmap(2) was unmapped using munmap(2). I have yet to explore the actual implementation of the free function, but at a glance, it looks like the process "advises" the system that the information contained in the memory region is not need and can be reused right away. A process provides such advise to the system through the madvise system call. Indeed, the free(3) function will eventually invoke the munmap(2) system call.

The RustTM compiler is able to detect such errors by the virtue of static analysis of source file. During compilation, the rust compiler performs various checks that most compilers perform for the respecitve language. In addition to this, the RustTM compiler statically analyzes the source file for memory related issues such as this one (use-after-free) along with other ownership model checks. This does not mean that compilers for the C language does not support this feature. For instance, gcc provides the -fanalyzer option that can be used during compilation to perform inter-procedural analysis. clang from llvm also provides similar feature, but the option is called --analyze. clang also provides a command-line utility called scan-build that is used during the build process.

As we can see in Listing 5, the static analyzer reports that the program suffers from a use-after-free issue. I usually prefer the runtime analyzer over the static analyzer as the reporting is verbose. The compiler flag such as -fsanitize=address is used to probe an Address Sanitizer (ASan) to the program such that issues like use-after-free is detected at runtime. It also provides a stack backtrace. Another one I frequently use is -fsanitize=undefined that is used to detect any undefined behavior during runtime (UBSan). Some of the potential usage of UBSan is to detect array subscripts out of bounds where the bounds can be statically defined [2], signed integer overflow, dereferencing misaligned or null pointers [3].


Script started on Wed Aug 27 16:28:49 2025

bash-3.2$ clang --analyze -DLISTING1 segfault.c
segfault.c:107:6: warning: Use of memory after it is freed [unix.Malloc]
*x = 0xA455;
~~ ^
warning generated.
bash-3.2$ exit

Script done on Wed Aug 27 16:29:15 2025
Listing 5: Using --analyze flag on clang

Static analysis has its pros and cons. It ensures that the program being built is hardened from some of the commonly found bugs. But it does come with a tradeoff; compilation time. Both gcc and clang mentions that static analysis is expensive compared to other warnings flags [4].

References

  1. [0] Before the advent of MMU, programs used physical memory location in RAM for various operations. This was an issue that would requires its own blog. LaurieWired made a video to discuss about virtual memory addressing called How a Clever 1960s Memory Trick Changed Computing. It's safe to say that most general purpose computers have a dedicated MMU that handles the required translation of virtual memory address to physical address in RAM. The address space for a process is not distinct compared to other process's address space. In fact, if two programs was to use the same standard C library, those process's would probably load the library in identical address space although it is not a requirement.
  2. [1] https://git.musl-libc.org/cgit/musl/tree/src/malloc/mallocng/malloc.c
  3. [2] One might assume that this issue should be a consideration for ASan. Consider a function variable foo that is an array of characters. Such function variables are stored in the stack, not in the data or the bss section, during runtime. For a program under execution, there probably won't be any segmentation violation if an attempt was made to access a subscript of foo that exceeds the declared size. This is cause a function's stack frame contains: instructions for the function, the architecture's ABI; function prologue and epilogue, and potential stack canary (that must not be tampered), and the local variables that are declared for the function. Most architectures use little endian, so there's a chance that a buffer overflow could be done such that the return address located at the beginning of the stack can be modified and can cause the function to return to some other function causing arbitrary code to be executed later.
  4. [3] When I tried to run the programs from Unix Network Programming, by W.R. Stevens, some of them caused this runtime error. The buffered data from the network does not only contain ASCII text. For example, if the server sends a binary data for struct timeval to the client, the client can't interpret the data and we need to assign it to a variable of type struct timeval. A character variable is aligned to a 1-byte address, but it is not the same for a struct timeval variable. If this structure was 8-byte aligned, then the address that has the 3 Least Significant bit (LSb) not zero would invoke memory alignment issue.
  5. [4] https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html && https://clang-analyzer.llvm.org/