Can You Trust Your Compiler?

robot-162087_640Some times it’s helpful to step back and take a fresh look at familiar topics. Let’s think about software development and ask what appears to be a strange question: Can we trust the compiler to generate machine code that does what our source code specifies?

What You Write

Most programmers these day write in high-level languages. C/C++/C#, Java, Python, and so on. Reuse of object-oriented code means that we try to see these more and more as generalized modules. We know that this module accomplishes a certain task, so we don’t get “down in the weeds” with the details of the code if we can avoid it.

What The Computer Runs

The processor executes machine code, instructions that the hardware implements with low-level moves, copies, additions, logical operations like AND and XOR, and more.

Do As I Mean, Not As I Say

Python is run by an interpreter and Java runs in a virtual machine. Let’s avoid the added complication and think about a C program. Let’s keep it very simple:

#include<stdio.h>
main() {
        printf("Hello, world\n");
}

Yes, a program that does nothing more than send the string “Hello, world” to standard output. It outputs 13 characters including the newline. But what happens when that is turned into a program?

It depends…

Let’s compile it, turning my C source code into an executable file:

$ make hello
cc hello.c -o hello

On Linux I get a 12,297 byte ELF file containing my string, references to shared libraries, and other data. On OpenBSD the result is 8,161 bytes long, even though the binaries are meant for the same architecture.

It gets stranger: You can use the -O1, -O2, or -O3 options to optimize the compilation. This trivial example yields executable files of the same size and function, but with different SHA-256 hashes. Here’s a way to see what the different optimization levels do:

$ gcc -c -Q -O1 --help=optimizers > /tmp/O1-options
$ gcc -c -Q -O2 --help=optimizers > /tmp/O2-options
$ gcc -c -Q -O3 --help=optimizers > /tmp/O3-options

Compare them with diff or browse them with more. Then consider this: When you ask for optimization, the compiler transforms the source code you provided into something slightly different. It will unroll loops, convert if-then-else logic, attempt to optimize branch prediction, and make other changes.

Aggressive Optimization Is Considered Potentially Unsafe

A compiler transforms your source code into another form. Rather than worrying about bugs just within the optimization modules of the compiler, consider that any compiler bugs may introduce bugs into your compiled code, optimized or not.

We don’t have to trip over compiler bugs to run into security problems. Consider a program that must handle an especially sensitive piece of data, like a cryptographic key. There may be a security requirement that the memory be overwritten after the key is no longer needed. However, an optimizing compiler may conclude that the memory is never read again by the program, and optimize away the overwriting! See this page at CERT’s Secure Coding Standards site for some examples of both risky and safe code. More formally, this paper from MIT studies the class of optimization-unstable software in which code is unexpectedly discarded by optimizations. Guess what: the Linux kernel, the Postgres database, the Chromium browser, the Python interpreter, and other major projects contain optimization-unstable code!

Learning Tree’s System and Network Security Introduction is, after all, an introduction, and we don’t usually get to this level in that course, but secure coding may be your eventual destination. Be careful compiling your software, make sure that you get the code you want!

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.