Tuesday, September 20, 2016

Separating Code in Header and CPP File


 

Why we need separate files for cpp code ?

C++ code for a module is divided into header and cpp files. Header tells us "what" and cpp tells us "how".  In other words, header is meant to "declare" the interface and cpp is meant to "define" the implementation of interface.

There is another reason of why to have declaration and definition separate. Lets understand it with an example :-


//Util.h
void PrintHello()
{
   
}

//a.cpp
#include "Util.h"
void foo()
{
      PrintHello();
}


//b.cpp
#include "Util.h"
extern void foo();
int main()
{
      foo();
      PrintHello();
} 


Terminal Snippet:-


sagar@sagar-SVE15127CNS:~$ g++ -c a.cpp -o a.o
sagar@sagar-SVE15127CNS:~$ nm a.o
0000000000000000 T __Z10PrintHellov
0000000000000010 T __Z3foov
sagar@sagar-SVE15127CNS:~$ g++ -c b.cpp -o b.o
sagar@sagar-SVE15127CNS:~$ nm b.o
0000000000000000 T __Z10PrintHellov
                 U __Z3foov
0000000000000010 T _main
sagar@sagar-SVE15127CNS:~$ g++ a.o b.o -o a.out
duplicate symbol __Z10PrintHellov in:
    a.o
    b.o
ld: 1 duplicate symbol for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)


You can see that PrintHello symbol is present in both a.o and b.o, So when we link a.o and b.o to make a.out, it results in symbol duplication.

What happens when templates comes in to picture ?

We can walk over the above example but this time PrintHello will be a templatized function and see how compiler treats it.


//Util.h
template<typename T>
void PrintHello()
{
}

//a.cpp
#include "Util.h"
void foo()
{
      PrintHello<int>();
}

//b.cpp
#include "Util.h"
extern void foo();
int main()
{
      foo();
      PrintHello<int>();
}


Terminal Snippet:-

sagar@sagar-SVE15127CNS:~$ g++ -c a.cpp -o a.o
sagar@sagar-SVE15127CNS:~$ nm a.o
0000000000000010 S __Z10PrintHelloIiEvv
0000000000000000 T __Z3foov
sagar@sagar-SVE15127CNS:~$ g++ -c b.cpp -o b.o
sagar@sagar-SVE15127CNS:~$ nm b.o
0000000000000020 S __Z10PrintHelloIiEvv
                 U __Z3foov
0000000000000000 T _main
sagar@sagar-SVE15127CNS:~$ g++ a.o b.o -o a.out
sagar@sagar-SVE15127CNS:~$ nm a.out
0000000100000f90 t __Z10PrintHelloIiEvv
0000000100000f80 T __Z3foov
0000000100000000 T __mh_execute_header
0000000100000fa0 T _main
                 U dyld_stub_binder 


Again, a.o and b.o both have PrintHello symbol in their obj files, but when we linked a.o and b.o , things worked fine and there is no duplication error. What is different this time is PrintHello is in "S" section of a.o and b.o object code not in "T" segment. It means, it is only template code, it doesnot exists as part of object code. For more details on segment, see manual page of nm.

Don't do same mistake with template specialization

Template Specialization works in different ways, template method has no existence in memory but its specialization has existence in memory. We usually do same mistake and define template specialization at same place where template is, and mostly in header file. But as I said, header file is only meant for declaration or templates, which has no existence in memory.  So we should not have template specialization in headers.
Lets walk above example again with template specialization.

 
//Util.h
template<template T>
void PrintHello()
{
}

template<>
void PrintHello<int>()
{
}

//a.cpp
#include "Util.h"
void foo()
{
      PrintHello();
}

//b.cpp
#include "Util.h"
extern void foo();
int main()
{
      foo();
      PrintHello();
}


Terminal Snippet:-


sagar@sagar-SVE15127CNS:~$ g++ -c a.cpp -o a.o
sagar@sagar-SVE15127CNS:~$ nm a.o
0000000000000000 T __Z10PrintHelloIiEvv
0000000000000010 T __Z3foov
sagar@sagar-SVE15127CNS:~$ g++ -c b.cpp -o b.o
sagar@sagar-SVE15127CNS:~$ nm b.o
0000000000000000 T __Z10PrintHelloIiEvv
                 U __Z3foov
0000000000000010 T _main
sagar@sagar-SVE15127CNS:~$ g++ a.o b.o -o a.out
duplicate symbol __Z10PrintHelloIiEvv in:
    a.o
    b.o
ld: 1 duplicate symbol for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)


As I exaplained above template specialization has existence in memory, which is reflected in obj code, See PrintHello is in "T" section.

Friday, September 16, 2016

How C++ Compiler compiles a program ?



Chewing of C/C++ code by compiler can be majorly divided into pre-processor, object code generation, linking the byte code.

1. Pre Processor


Pre processor is a text substitution utility. It processes all macros, #defines and generates a temporary c/c++ file, which contains only relevant code.
Consider a sample code:-

// a.cpp
#define VAL 10

int main()
{
    int x = VAL;
    return 0;
}




The output of Preprocessor for the above program will be :-

int main()
{
    int x = 10;
    return 0;
}


You can try this by yourself, g++ gives an option (-E)  to dump preprocessor output.
Terminal Snippet:-

sagar@sagar-SVE15127CNS:~$ g++ -E a.cpp
# 1 "a.cpp"
# 1 "" 1
# 1 "" 3
# 336 "" 3
# 1 "" 1
# 1 "" 2
# 1 "a.cpp" 2


int main()
{
    int x = 10;
    return 0;
}


2. Object Code generation

Expanded C/C++ code goes to compilation through various phases, these phases can be seen in diagram below.












Compiler Phases :-
Lets talk quickly about compiler phases.

Lexical Analyzer:-  Compiler tokenizes the complete C++ code.
Syntax Analyzer:- Compiler creates a syntax tree and see if it matches the grammer of C++ language.
Semantic Analyzer:- Compiler verifies the semantics of C++ code from syntax tree.
Optimization:- Compiler do code optimizations like constant propagation, loop unrolling etc.
Object code generation:- Compiler generates object code.

We will cover compiler phases in details in other posts.

Object File:-

What is an object file ? One thing is sure it is not an executable file. So what is object file, Object file is a file which contains byte code corresponding to its c++ code.  Talking of corresponding c++ code, lets try to understand with example:-

Consider the sample code:-

// a.cpp
extern void func();
void foo()
{
   func();  
}

Above code will compile fine, because func is declared not defined, it does not matter for compiler if symbol is defined or not. It is the linker phase, where linker requires all symbol definition.

Before going further lets understand some Shell commands:-
1.  nm - display name list (symbol table)
2.  c++filt - Demangle C++ and Java symbols.

We can compile the above code with g++, and lets try to understand what is there in object file.

Terminal Snippet:-

sagar@sagar-SVE15127CNS:~$ g++ -c a.cpp -o a.o
sagar@sagar-SVE15127CNS:~$ nm a.o
0000000000000000 T __Z3foov
                 U __Z4funcv
sagar@sagar-SVE15127CNS:~$ c++filt __Z3foov
foo()
sagar@sagar-SVE15127CNS:~$ c++filt __Z4funcv
func()

We used nm command to look symbols inside object file. What we can see is there are 2 symbols, foo and func . We can see U before func symbol which means this symbol is not defined in this object file.

 Consider another file:-

// b.cpp
void func()
{
}

//c.cpp
extern void foo(); 
int main()
{
    foo();
    return 0;
} 


We can compile the above code with g++, and lets try to understand what is there in object file.

Terminal Snippet:-


sagar@sagar-SVE15127CNS:~$ g++ -c b.cpp -o b.o
sagar@sagar-SVE15127CNS:~$ nm b.o
0000000000000000 T __Z4funcv
sagar@sagar-SVE15127CNS:~$ g++ -c c.cpp -o c.o
sagar@sagar-SVE15127CNS:~$ nm c.o
                             U __Z3foov
0000000000000000 T _main 

3. Linker

Linking is phase, where Linker tries to combine all byte codes while locating all symbols and creates a single executable. This executable contains all required symbols.

When can understand with some terminal snippets:-

sagar@sagar-SVE15127CNS:~$  g++ a.o b.o c.o -o output.out
sagar@sagar-SVE15127CNS:~$  nm output.out 
0000000100000f70 T __Z3foov
0000000100000f80 T __Z4funcv
0000000100000000 T __mh_execute_header
0000000100000f90 T _main
                 U dyld_stub_binder



Please share your feedback about this post.