Friday, September 16, 2016

How C++ Compiler compiles a program ?



Chewing of C/C++ code by compiler can be majorly divided into pre-processor, object code generation, linking the byte code.

1. Pre Processor


Pre processor is a text substitution utility. It processes all macros, #defines and generates a temporary c/c++ file, which contains only relevant code.
Consider a sample code:-

// a.cpp
#define VAL 10

int main()
{
    int x = VAL;
    return 0;
}




The output of Preprocessor for the above program will be :-

int main()
{
    int x = 10;
    return 0;
}


You can try this by yourself, g++ gives an option (-E)  to dump preprocessor output.
Terminal Snippet:-

sagar@sagar-SVE15127CNS:~$ g++ -E a.cpp
# 1 "a.cpp"
# 1 "" 1
# 1 "" 3
# 336 "" 3
# 1 "" 1
# 1 "" 2
# 1 "a.cpp" 2


int main()
{
    int x = 10;
    return 0;
}


2. Object Code generation

Expanded C/C++ code goes to compilation through various phases, these phases can be seen in diagram below.












Compiler Phases :-
Lets talk quickly about compiler phases.

Lexical Analyzer:-  Compiler tokenizes the complete C++ code.
Syntax Analyzer:- Compiler creates a syntax tree and see if it matches the grammer of C++ language.
Semantic Analyzer:- Compiler verifies the semantics of C++ code from syntax tree.
Optimization:- Compiler do code optimizations like constant propagation, loop unrolling etc.
Object code generation:- Compiler generates object code.

We will cover compiler phases in details in other posts.

Object File:-

What is an object file ? One thing is sure it is not an executable file. So what is object file, Object file is a file which contains byte code corresponding to its c++ code.  Talking of corresponding c++ code, lets try to understand with example:-

Consider the sample code:-

// a.cpp
extern void func();
void foo()
{
   func();  
}

Above code will compile fine, because func is declared not defined, it does not matter for compiler if symbol is defined or not. It is the linker phase, where linker requires all symbol definition.

Before going further lets understand some Shell commands:-
1.  nm - display name list (symbol table)
2.  c++filt - Demangle C++ and Java symbols.

We can compile the above code with g++, and lets try to understand what is there in object file.

Terminal Snippet:-

sagar@sagar-SVE15127CNS:~$ g++ -c a.cpp -o a.o
sagar@sagar-SVE15127CNS:~$ nm a.o
0000000000000000 T __Z3foov
                 U __Z4funcv
sagar@sagar-SVE15127CNS:~$ c++filt __Z3foov
foo()
sagar@sagar-SVE15127CNS:~$ c++filt __Z4funcv
func()

We used nm command to look symbols inside object file. What we can see is there are 2 symbols, foo and func . We can see U before func symbol which means this symbol is not defined in this object file.

 Consider another file:-

// b.cpp
void func()
{
}

//c.cpp
extern void foo(); 
int main()
{
    foo();
    return 0;
} 


We can compile the above code with g++, and lets try to understand what is there in object file.

Terminal Snippet:-


sagar@sagar-SVE15127CNS:~$ g++ -c b.cpp -o b.o
sagar@sagar-SVE15127CNS:~$ nm b.o
0000000000000000 T __Z4funcv
sagar@sagar-SVE15127CNS:~$ g++ -c c.cpp -o c.o
sagar@sagar-SVE15127CNS:~$ nm c.o
                             U __Z3foov
0000000000000000 T _main 

3. Linker

Linking is phase, where Linker tries to combine all byte codes while locating all symbols and creates a single executable. This executable contains all required symbols.

When can understand with some terminal snippets:-

sagar@sagar-SVE15127CNS:~$  g++ a.o b.o c.o -o output.out
sagar@sagar-SVE15127CNS:~$  nm output.out 
0000000100000f70 T __Z3foov
0000000100000f80 T __Z4funcv
0000000100000000 T __mh_execute_header
0000000100000f90 T _main
                 U dyld_stub_binder



Please share your feedback about this post.

No comments:

Post a Comment