Learning C++ from Java - Building, Namespaces, Linkage, and more
· 7 min readI recently had to learn C++ for work and have done most of that learning so far through Learn C++. This series of posts will highlight what I have found interesting about C++, especially given my Java background.
Note that these posts are not meant to be tutorials on writing C++; I recommend visiting Learn C++ if you want a thorough C++ tutorial.
Table of Contents
- Building a C++ Program
- Initializing Variables
- Namespaces
- Linkage
- Storage Duration
- The 'static' keyword
- Conclusion
- Further Reading
Building a C++ Program #
A C++ program comprises one or more source files and header files, and the steps to create an executable from source code are preprocessing, compilation, and linking.
Preprocessing #
This involves handling all preprocessor directives. Preprocessor directives are instructions for a preprocessor that tell it to perform text manipulation tasks. An example is the #include
directive, which tells the preprocessor to include the contents of a header file into the current file.
The output of the preprocessor is one or more translation units. A translation unit is a C++ source file after the preprocessor has included the contents of its header file.
Compilation #
This is where the compiler converts each translation unit into an object file containing machine code. These object files cannot be run yet but can be stored for reuse later on.
Linking #
This is the final stage, where a linker combines all the object files to form an executable program. This stage involves linking any needed library code and resolving all cross-file dependencies.
C++ compilers typically come bundled with separate programs to perform all three steps.
Initializing Variables #
There are different ways to initialize a C++ variable and the difference between these methods is more significant when initializing complex types; they have a similar effect when initializing primitive types.
This section will focus primarily on initializing primitive types, with a discussion on complex types to come in a later post.
Copy Initialization #
Copy initialization takes place when you initialize a variable using an equals sign.
double width = 10.23;
int height = 4;
Here, the program copies the value on the right-hand side of the equals into the address of the variable on the left.
Copying can be an expensive operation for large objects, but it is efficient for primitive types.
Direct Initialization #
Direct initialization is initialization using non-empty parentheses.
double width(10.23);
int height(4);
Direct initialization and copy initialization do the same thing for primitive types, but have different behaviours for class objects, which you can learn more about here.
List Initialization #
List initialization (also called uniform initialization or brace initialization) can occur in both direct initialization and copy initialization contexts as direct-list-initialization and copy-list-initialization.
double width{10.23}; // Direct-list-initialization.
int height = {4}; // Copy-list-initialization.
One difference between using list or brace initialization and the other forms of initialization for primitive types is that list initialization is stricter. Quoting Learn C++:
Brace initialization has the added benefit of disallowing “narrowing” conversions. This means that if you try to use brace initialization to initialize a variable with a value it can not safely hold, the compiler will throw a warning or an error.
So if you attempt to initialize a variable like int speed{43.1}
, list initialization will throw an error, while copy and direct initialization will simply drop the fractional part.
Unlike direct and copy initialization, you can also use list initialization to populate containers (collections in Java) with elements during initialization like:
std::vector<double> widths = {12.3, 10.2, 3.4, 4.7};
std::vector<int> heights{2, 3, 4, 5};
Value Initialization and Zero Initialization #
Value initialization takes place when you initialize a variable with empty braces. For primitive types, this will initialize the variable with a zero value equivalent for the type.
double width{}; // The value is 0.0.
int height{}; // The value is 0.
Default Initialization #
Default initialization takes place when you declare a variable with no initializer.
double width; // The value is undefined.
int height;
The C++ standard does not define what the values of width
and height
in the above example should be; it leaves it up to the compiler implementation. In many compilers, the values will be whatever garbage is in the memory address allocated for the variables.
Namespaces #
Namespaces in C++ help to prevent naming collisions by providing scope to identifiers. They are similar to Java packages in that they both prevent naming collisions, but you can nest namespaces in C++ and can declare multiple namespaces in a single C++ file, among other differences.
For example, if you define a namespace like
namespace Math {
int add(int x, int y){
return x+y;
}
}
You can call the add
function by prefixing it with its namespace Math::add(1,2)
, and the compiler won't confuse it with an add
function defined in a separate namespace.
You can also declare functions and other identifiers in C++ outside an explicit namespace. Such identifiers become part of the global namespace of your program. Using the above example, if you define an add
function in the global namespace, you can refer to it as ::add(3,4)
or just add(3,4)
if there is no conflicting add()
function defined.
C++ also lets you declare the same namespace in multiple files, provided there's only one definition of each identifier in the namespace. The compiler will group the declarations together in the linking phase.
Namespace Aliases #
I mentioned earlier that you can nest namespaces in C++, so it's not uncommon to see code like:
School::Subject::Maths::Numbers::Arithmetic::add(12,34);
Okay, I exaggerate and I hope it is uncommon, but my point is namespaces can be deeply nested and C++ provides a convenient way to refer to nested namespaces through namespace aliases.
You can define a namespace alias for the above example like:
namespace MathsArithmetic = School::Subject::Maths::Numbers::Arithmetic;
And invoke the add function using:
std::cout << MathsArithmetic::add(100, 345) << "\n";
In this example, cout
is a function that belongs to the std
namespace.
Linkage #
Linkage is a property of an identifier that specifies whether it is visible outside its translation unit.
If an identifier has internal linkage, it is only visible to the linker in its translation unit. const
global variables have internal linkage by default.
Quoting Learn C++:
An identifier’s linkage determines whether other declarations of that name refer to the same object or not...This means that if two files have identically named identifiers with internal linkage, those identifiers will be treated as independent.
For example, if I have two source files, fruit.cpp
and vegetable.cpp
containing the following definitions:
fruit.cpp:
#include <string>
#include <iostream>
const std::string favouriteFruit = "carrot";
int main (){
std::cout << favouriteFruit << "\n";
}
vegetable.cpp:
#include <string>
#include <iostream>
const std::string favouriteFruit = "carrot";
void printFruit()
{
std::cout << favouriteFruit << "\n";
}
Because const
global variables have internal linkage, there will be no issues here during the linking phase. Each translation unit will get its own copy of favouriteFruit
, avoiding any conflicts.
To make an identifier have external linkage, use the extern
keyword.
An identifier with external linkage is visible from any translation unit in the program. All functions and non-const global variables implicitly have external linkage.
Using the previous example,
fruit.cpp:
#include <string>
#include <iostream>
std::string favouriteFruit = "carrot";
int main (){
std::cout << favouriteFruit << "\n";
}
vegetable.cpp:
#include <string>
#include <iostream>
std::string favouriteFruit = "carrot";
void printFruit()
{
std::cout << favouriteFruit << "\n";
}
Here, both definitions of favouriteFruit
have external linkage and will cause a linking error because of the multiple definitions available to the linker, in violation of the one definition rule. Use the static
keyword to make an identifier with external linkage have internal linkage.
Local variables have no linkage, which means all declarations of local variables with the same name refer to different objects.
The next post on header files will include a broader discussion on linkage with more usage examples, and I recommend reading this article if you want to learn more before then.
Linkage is not scope #
While scope and linkage may seem similar, they mean different things. An identifier's scope defines where it is visible within its translation unit, while linkage determines whether declarations of the same name in other translation units refer to the same identifier or not.
An article from IBM also distinguishes them like:
Scope and linkage are distinguishable in that scope is for the benefit of the compiler, whereas linkage is for the benefit of the linker. During the translation of a source file to object code, the compiler keeps track of the identifiers that have external linkage and eventually stores them in a table within the object file. The linker is thereby able to determine which names have external linkage, but is unaware of those with internal or no linkage.
Storage Duration #
All variables in a C++ program have a storage duration, which determines the rules for when the program creates and destroys them. Two forms of this duration are automatic and static duration.
When a variable has automatic duration, it means the program allocates its storage at the point of the variable's definition and deallocates it when the program exits the enclosing code block. Local variables have automatic duration by default.
For a variable with static duration, its storage is allocated when the program starts and deallocated when the program ends. Only one instance of a variable with static duration exists. Global variables have static duration.
You can also use the static
keyword to change the duration of a local variable to static duration. When you make a local variable static, the program only initializes the variable the first time it encounters the initialization. Subsequent calls to its enclosing function will reuse the already initialized instance.
Variables can also have dynamic duration, which means the program creates and destroys them on programmer request, as in dynamically allocated variables created with the new
keyword.
The 'static' keyword #
So far, I've mentioned the static
keyword as both a way to denote internal linkage and static storage duration. This might be confusing, so note that the static
keyword in C++ can appear in three contexts:
- When used in declaring a variable, the
static
keyword acts as a storage class specifier and gives the variable both static duration and internal linkage. - When used with a function that's not a member of a class, the
static
keyword gives it internal linkage. - For class member variables and functions, as in Java, making them
static
means we can use them without creating an instance of the class.
Conclusion #
I went into this thinking C++ would be mostly like Java except with pointers and references, and I have been only been proven wrong so far. Though they bear some similarities, especially in their syntax, I've been surprised by how different they are and I'm looking forward to exploring that further in subsequent posts.
Further Reading #
- Variable assignment and initialization - Learn C++.
- Initializers - Microsoft C++ Docs.
- Initialization - cppreference.com
- Introduction to the compiler, linker, and libraries - Learn C++.
- How does the compilation/linking process work? - From Stack Overflow.
- Internal and External Linkage in C++ by Peter Goldsborough.
- Storage class specifiers - cppreference.com.