Journey of a C Program

Ever wondered what happens behind the scenes when you compile a C program? If you have, this post demystifies everything that happens when you compile a C program. As it turns out, the journey of a C program from the human readable source file to the final executable is comprised of four stages. Before we delve deep into each of those stages, just for the sake of context, let’s quickly go through the typical process of compiling a C program. We are using gcc and the GNU toolchain which is the de facto compiler and build system for C in Linux.

Here’s our C program named hello.c :

#include <stdio.h>

#define STRING "hello world\n"

int main(void)

	// printing by substituting the macro

	return 0;

Next we compile the above source code using gcc like so:

$ gcc hello.c -o hello

Compiling a C program

Note here that hello.c is the source file to be compiled by gcc. The -o (oh) option tells gcc what should be the name of the executable file so compiled (hello). If -o option is omitted, by default gcc names the executable file as “a.out“.

Lastly, we execute the compiled file hello to see the result which in this case just prints “hello world” on the screen.

Executing helloworld program C

What just happened was a transformation from the source code to the executable through four specific stages. They are summarized in the following diagram.

Journey of C program

By default, gcc takes care of all the four stages one after the other to produce the executable. We can instruct gcc to do only what we want by specifying the right command-line switch. Let’s examine each of the four stages in detail.

Stage #1 : Pre-processing

The first stage is pre-processing during which following actions take place:

  1. Macro substitution
  2. Comments are stripped off
  3. Header files are expanded

The pre-processor accepts the .c file and unless specified by the -o switch, the output is echoed onto the stdout.

gcc pre-processing

gcc -E hello_c

Let’s examine the hello.i file.

# 1 "hello.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "hello.c"
# 1 "/usr/include/stdio.h" 1 3 4
  # 873 "/usr/include/stdio.h" 3 4
extern FILE *popen (const char *__command, const char *__modes) ;

extern int pclose (FILE *__stream);

extern char *ctermid (char *__s) __attribute__ ((__nothrow__ , __leaf__));

# 913 "/usr/include/stdio.h" 3 4
extern void flockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));
extern int ftrylockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) ;
extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));

# 943 "/usr/include/stdio.h" 3 4

# 2 "hello.c" 2

int main(void)

 printf("hello world");

 return 0;

Three things can be noticed from the large pre-processed file.
First, the macro STRING has been substituted with its character string value. Second, the comment we wrote above the printf statement has been stripped off. Third, the header file stdio.h has been expanded with hundreds of lines of code. By this we get to know that the header file source code actually gets inserted into our source file.

If we search for printf, we’ll get the following:

extern int printf (const char *__restrict __format, ...);

The keyword ‘extern’ tells that the function printf() is not defined here. It is external to this file. We will later see how gcc gets the definition of printf().

Stage #2 : Compilation

The second stage is compilation in which the GNU C compiler accepts the pre-processed hello.i file and outputs the compiled file named hello.s. Note here that the compiler expects its input file’s extension to be “.i.

gcc compilation

gcc -S hello_i

Viewing hello.s file reveals that the C tokens and instructions have been replaced with assembly language directives and instructions.

		.file	"hello.c"
	.section	.rodata
	.string	"hello world"
	.globl	main
	.type	main, @function
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$.LC0, %edi
	movl	$0, %eax
	call	printf
	movl	$0, %eax
	popq	%rbp
	.cfi_def_cfa 7, 8
	.size	main, .-main
	.ident	"GCC: (Debian 4.9.2-10) 4.9.2"
	.section	.note.GNU-stack,"",@progbits


Stage #3 : Assembly

The third stage is assembly in which the compiled output is passed to the assembler. The assembler expects its input file’s extension to be “.s” and produces an intermediate file with the extension “.o“.

gcc assembly

gcc -c hello_s

The compiled file hello.s in the previous stage is nothing but a bunch of assembler directives to be interpreted by the assembler. GCC internally calls the GNU Assembler as to do the job of interpreting the assembly level instructions in the compiled file to produce the machine level code. This machine code is also known as the object code. You can also call the as independently to process hello.s instead via gcc like so:

$ as hello.s -o hello.o

At this stage only the existing code is converted into machine language, the function calls such as printf() are not resolved.

Since the output of this stage is a machine level file (hello.o), its content is not understandable by us. If we still try to open the hello.o and view it, we’ll see something that is totally not readable.

ELF object file

The only thing we can explain by looking at the print.o file is about the string ELF. ELF stands for executable and linkable format. This is a relatively new format for machine level object files and executables that are produced by gcc. Prior to this, a format known as a.out was used. ELF is said to be a format that’s more sophisticated than a.out.

Note that if you compile your code without specifying the name of the output file, the output file produced has name ‘a.out’, but the format now have changed to ELF. The default executable file name has nothing to do with the format of the machine code. The same name a.out is mere incidental.

Stage #4 : Linking

This is the last stage in which some housekeeping functions are performed by the linker to produce the ready-to-run machine level code. Calling gcc without any option will link all the object files to produce the final executable.

gcc linking

gcc hello.o

As discussed earlier, till this stage gcc doesn’t know about the definition of functions like printf(). Until the compiler knows exactly where all of these functions are implemented, it simply uses a place-holder for the function call. It is at this stage, the definition of printf() is resolved and the actual address of the function printf() is plugged in.

gcc internally makes use of the GNU Linker ld to achieve this task. You can directly call ld to link the object files like so:

$ ld hello.o -o hello

The linker also does some extra work; it adds extra code to our program that is required to indicate when the program starts and when the program ends. For example, there is code which is standard for setting up the running environment like passing command line arguments, passing environment variables to every program. Similarly some standard code that is required to return the return value of the program to the system.

The above tasks of the compiler can be verified by a small experiment. Since now we already know that the linker converts .o file (hello.o) to an executable file (hello). If we compare the file sizes of both the hello.o and hello file, we’ll see the difference.

size hello.o and hello

Through the size command we get a rough idea about how the size of the output file increases from an object file to an executable file. This is all because of that extra standard code the linker adds to our program.

That’s all there to it about what happens when you compile a C program. Isn’t this beautiful!

Pro Tip: You can pass the –save-temp switch to gcc to get all the intermediate files in one command.

$ gcc --save-temp hello.c -o hello

gcc --save-temp

About Deepak Devanand

Seeker of knowledge
This entry was posted in C, gcc, Uncategorized and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s