Compilation stages in C language | Compilation Process in C
Introduction:
We have looked at the Structure of C program in an earlier article, In today’s article, We are going to discuss about the Compilation stages in C Programming Language or the Compilation process in C Language.
Pre-Requisite:
We are going to run the program using the GCC compiler. Here is an article, Which explains the compilation process on Linux using the GCC compiler. Please goes through it before looking at this article.
Compilation Stages in C Language:
Each C Program goes through a few stages during the program execution. These stages are called the Compilation stages or the Compilation process.
Here is the list of stages in the compilation process. These are steps which going to happen when we compile and Run the program.
- Source Code
- Pre-Processor
- Translator
- Assembler
- Linker
- Loader
Let’s look at each step in detail.
Source Code:
It all starts with the program. Let’s create a simple C program, which displays the Hello World onto the console.
prog.c
1 2 3 4 5 6 7 8 9 10 11 |
#include<stdio.h> #define NAME "Roger" void main() { /* This is multi line comment Print Hello world on console */ printf("Hello World : %s \n", NAME); // This is Single line Comment } |
We save the C programs with the .c extension.
If you run the program, You will get the following output.
1 2 3 4 |
$ gcc prog.c $ ./a.out Hello World : Roger $ |
We run the program using the gcc compiler. Here is an article to compile and run programs in Linux
Pre-Processor step:
Pre Processor Directives always start with Hash Symbol ( #). The C programs are passed through the preprocessor before going to Compiler.
Commonly used Preprocessor Directives are #include and #define.
Preprocessing step is one of the important steps in the Compilation process in C language.
Here are the actions performed by the pre-processor.
- File Inclusion ( Adding Header files)
- Removing Comments
- Macro Substitution
- Conditional Compilation
File Inclusion (Adding Header files):
The #include pre-processing directive is used for Including files (Like header files).
In the above program, The #include<stdio.h> is the preprocessor directive.
The stdio.h is a header file that contains the prototype of the input and output functions like printf and scanf.
If we use the standard input and output functions like printf or scanf in our program, Then we have to use the stdio.h header file.
In this preprocessing step, Pre Processor will add the header files to our program. Which creates a new Intermediate file. So if you check the output after the pre-processing step, You will notice there will be many lines of code added to your original code. These lines are header files, which we included at the top of the program using the #include directive.
Here is how the above program likes like after header file inclusion
Normally we use the gcc prog.c command to compile the C Program. which generates the executable file. Doing so will not generate the intermediate files. We only get the final executable file ( In Linux we get ./a.out file, In windows, we get .exe file).
To generate the intermediate file after the Pre Processing stage. We need to use the following command
1 |
$ gcc -E prog.c -o prog.i |
If you check the prog.i file, You will notice there are many lines of code added to your program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
.... .... .... .... .... extern char *ctermid (char *__s) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__access__ (__write_only__, 1))); # 867 "/usr/include/stdio.h" 3 4 extern void flockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)); extern int ftrylockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) ; extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)); # 885 "/usr/include/stdio.h" 3 4 extern int __uflow (FILE *); extern int __overflow (FILE *, int); # 902 "/usr/include/stdio.h" 3 4 # 2 "prog.c" 2 # 4 "prog.c" void main() { printf("Hello World : %s \n", "Roger"); } |
Here is the screenshot of the prog.i file.
If you notice the line numbers, Our main function started at 740th line, The pre-processer included the stdio.h header file to our program.
If you search for the printf function, In the file, You can find the prototype or syntax of the printf function.
1 |
extern int printf (const char *__restrict __format, ...); |
You can also find the prototype of all other input and output functions like scanf, sprintf, fprintf, etc.
Comments Removal:
The preprocessor directive also removes the comments.
Comments are useful descriptions of the source code, Comments are used to explain how the block of code is implemented. Usually, programmers write a short explanation about the code, So that the source code will be more readable and understandable to the other programmers and colleagues.
C Program support single-line or multi-line comments, you can learn more about the comments in c in the following program.
So if you look at the above intermediate file prog.i. You can realize that
Initially, our program had two comments, One multi-line comment
1 2 3 4 |
/* This is multi line comment Print Hello world on console */ |
and one single-line comment.
1 |
printf("Hello World..\n"); // This is Single line Comment |
Both those comments are removed in prog.i file.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# 2 "prog.c" 2 # 4 "prog.c" void main() { printf("Hello World : %s \n", "Roger"); } |
Macro Substitution:
The #define directive is used to define Symbolic Constants / Macros.
We have defined a Constant NAME using the #define directive.
If we look at the Intermediate file i.e prog.i. We can see the #define line is removed and the NAME is replaced with the Roger string.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# 2 "prog.c" 2 # 4 "prog.c" void main() { printf("Hello World : %s \n", "Roger"); } |
So pre processor directive replaces all Macors and constants.
Conditional Compilation:
We use conditional compilation is to remove the un-necessary code.
#ifdef, #else, and #endif, etc.
Here is a Pseudo code for conditional compilation.
1 2 3 4 5 |
#ifdef NAME // Statements #else // Statements #endif |
Translator or Compiling stage:
The translator/compiler translates the High-level language code into Assembly Language code.
After this stage, We will receive Assembly level code or low-level language code, which will include mnemonics for words like ADD, SUM, SUB, SJMP, etc.
To generate the Assembly code, use the following command.
1 |
$ gcc –S prog.i –o prog.s |
We need to pass the pre-processed file .i file (i.e prog.i) to gcc with option -S, Which generates the Assembly file, we usually save them with .s extension. here prog.s is the file which contains the Assembly code.
If you look at the prog.s file, You will find the Assembly language mnemonics. here is how our prog.s file looks like
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
.file "prog.c" .text .section .rodata .LC0: .string "Roger" .LC1: .string "Hello World : %s \n" .text .globl main .type main, @function main: .LFB0: .cfi_startproc endbr64 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 leaq .LC0(%rip), %rax movq %rax, %rsi leaq .LC1(%rip), %rax movq %rax, %rdi movl $0, %eax call printf@PLT nop popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Ubuntu 11.2.0-19ubuntu1) 11.2.0" .section .note.GNU-stack,"",@progbits .section .note.gnu.property,"a" .align 8 .long 1f - 0f .long 4f - 1f .long 5 0: .string "GNU" 1: .align 8 .long 0xc0000002 .long 3f - 2f 2: .long 0x3 3: .align 8 4: |
Assembler:
Now we have Assembly language code, Next step is converting Assembly code into the Binary code or Machine language code
Assembler is also on the type of translator, but it is used to translate assembly-level language into Machine language code (machine language is computer understandable language containing zeros and ones only ).
After this stage, we will get pure binary code. This code is also called object code.
To generate the object code or binary code, we need to use the GCC with -c option. and pass the Assembly code, Like below.
1 |
$ gcc –c prog.s –o prog.o |
The output file prog.o is the Machine language code and Contains the zeros and ones. which are understood by the computer. If you have any Hex editor, you can try to open and see, It will look something like below.
Linker:
The C program contains many library functions (predefined functions) and user-defined functions.
The predefined functions are present in library files. The Library files contain the definitions of predefined functions in Binary language (Machine code).
The functions like printf and scanf are the predefined functions, Prototype of these functions is available in stdio.h the header file, please note, the stdio.h header file only contains the prototype.
Actual function definition of printf and scanf won’t present in stdio.h file. The actual function definitions are present in library files. So the Linker includes the necessary library files to our program, So that the definition of functions like printf, scanf is available for the compiler.
In a Nutshell, Linker adds the necessary library files to our program, In can also say, Linker, is used to link the called function with the calling function.
After Linker stage, our executable (Linux .a.out / Windows .exe) is ready. And The executable file is stored in your computer’s secondary storage.
To generate the executable file, use the gcc command, like below.
1 |
$ gcc prog.o |
The prog.o is the object file, which we got after the Assembler stage.
The above command gcc prog.o , Generates the executable file. In Linux you will see a ./a.out file, If you are using windows you will notice new .exe file.
📢 All above four compilation stages, From Preprocessing to Linker are equivalent to simply running the gcc prog.c , Notice, Here we are directly passing our C Source code to gcc without any options. This command creates the executable file. So It won’t generate intermediate files like prog.i, prog.s, and prog.o.
Most of the cases, We simply run the gcc prog.c to generate the executable, Except in cases where you want to debug macro substitutions, conditional compiling, and other debugging, etc.
Loader:
Now we have the executable file in the secondary memory ( i.e HDD/SSD).
The Loader loads the executable file from the secondary memory to the primary memory. Primary memory means RAM.
Any application needs to bring into primary memory in order to execute, but our all executable files are stored in the secondary memory, loader is used to bring executable files from secondary memory to primary memory. Typically Hard-disk/SSD to RAM
The Loader stage is equivalent to running the ./a.out command or opening the .exe file. Which loads our application to RAM. Which starts the execution of the program.
When the program starts executing, we call it a Process. The process is an Application that is executing.
These are the compilation stages in C or the Compilation process in C program. Which all starts with source code and ends up with an application.
Conclusion:
We discussed the Compilation process in C language or compilation stages in C language. Then looked at each step and examined it with examples. We hope you got a good idea of what happens when we compile the C program.
We are going to discuss the Standard Input and Standard output functions in the next article.
8 Responses
[…] Compilation Stages in C language. […]
[…] Compilation Stages in C language. […]
[…] Read : Compilation Stages in C language. Arithmetic operators in C language. Identifiers in C language and Rules for naming Identifiers. […]
[…] Compilation Stages in C language. […]
[…] Compilation stages in C language | Compilation stages of programming | Compilation stages – Si…es […]
[…] Compilation Stages in C language. […]
[…] Compilation Stages in C language. […]
[…] Please note, That the header files only contain the library functions declarations, The C library functions come in pre-compiled format, They will be linked to our program during the Linking process of the Compilation Process of the C program. […]