C/C++ language

Contents


Little items

Syntax for new(), new[](), delete() and delete[]()

Yes, there are two new and two delete operators (they are not functions).
The new() and delete() are called to allocate a single object and the new[]() and delete[]() are called to allocate an array of objects. You should always use delete() with new() and delete[]() with new[]().

Examples:

SomeObject *x = new SomeObject;  // use new()
SomeObject *y = new SomeObject(initialization, parameters, see, constructors);  // use new()
x->someMethod();  // call some method for x 
...
delete y;  // use delete()
delete x;  // use delete()
 
SomeObject *x = new SomeObject [20];  // use new[](), allocate 20 objects!
SomeObject *y = new SomeObject(initialization, parameters) [20];  // use new[]()
x[0].someMethod();  // call some method for the zero-th object
y[19].someMethod();  // call last y object's method
...
delete [] y;  // must use delete[]() since it is an array
delete [] x;  // use delete[]()

There may be a number inside the brackets of the delete, but these are ignored.

Compatibility of new operator on host and target

All of the following constructions are accepted by the g++ compiler on host, but some are not supported by the dcc compiler (for target).

Supported by the dcc compiler are:
unsigned long int **ul = new unsigned long int*; // 'one pointer to an unsigned long int'
char * ch = new char [10]; // 'array of characters'
RTDataObject *it = new RTDataObject [10]; // 'array of RTDataObjects'
RTDataObject **it = new RTDataObject* [10]; // 'array of pointers to RTDataObjects'
void **it = new void* [10]; // 'array of pointers (to something)'
void **it = new (void*) [10]; // 'array of pointers (to something)'
unsigned long int **ul = new unsigned long int* [10]; // 'array of pointers to unsigned long ints'
unsigned long int **ul = new (unsigned long int*) [10]; // 'array of pointers to unsigned long ints'
unsigned long int *ul = new unsigned long int [10]; // 'array of unsigned long ints'
unsigned long int *ul = new (unsigned long int) [10]; // 'array of unsigned long ints'

Not supported by the dcc compiler are:
char *ch = new (char) [10]; // 'array of characters'
RTDataObject *it = new (RTDataObject) [10]; // 'array of RTDataObjects'
RTDataObject **it = new (RTDataObject*) [10]; // 'array of pointers to RTDataObjects'

Conclusions:

  1. The dcc compiler seems to have problems with parentheses in the first argument (the type). So, eventhough it may be clearer to write the parentheses in the new expression, don't !
  2. Use only simple expressions for the length of an array. I.e.
    • Don't use function-calls
    • You may want to calculate the length in a temporary variable and use that variable.

Shallow, deep or member-wise copy

Below are three examples to describe the difference between a shallow copy, a deep copy and a member-wise copy. A standard copy-constructor in C++ does a member-wise copy!

In the following, I use an RTString as an example. An RTString has internally a pointer to the contents of the string, so in a picture it looks like this:

A shallow copy

Only copies the pointer, so the object is shared.

example:
RTString *shallow_copy = &original; // only the address is copied, the rest is shared

A deep copy

Make a complete separate object with contents.

example:
RTString deep_copy = original; // in C++ this would result in a member-wise copy, but OTD has overridden this behaviour to a deep copy

A member-wise copy

This is the default C++ behaviour, but in OTD it is overridden, so that a deep copy is done when you write something that is normally a member-wise copy. But be aware of this and secondguess yourself when you're not sure.

example:
RTPointer member_copy = original; // the pointer object is copied, but what it points to is not duplicated

Embedded C++ compatibility

This guidline states that we should only use the EC++ subset of C++. On page 36 of the A revision is a list of C++ elements that are not part of EC++ and subsequently should not be used by us. Below is the list with comments of mine in red. All of these are typical C++ (i.e. non-C) stuff, so is probably C programmers have never heard of them. But anyway, these elements are the more exotic elements of C++.

  1. Multiple (and virtual) inheritence and virtual base classes. Multiple inheritence can always be resolved by using aggregation (ask OO-Eddie :-) ). In OTD this can only be done when defining your own classes in your own .cpp file.
  2. Runtime type identification (RTTI). We can work around this by using methods like RTDataObject::getClassData(), RTActor::getTypeName(), RTActor::isSameType(), RTSequence::getSequenceData() and many others. Note that these functions are pretty expensive in that they compare strings and that kind of stuff.
  3. Namespace mechanism. We should use as few globals as possible, so all the definitions we do inside an actor are inside the actor class difinition and thus already inside a namespace (a class is also a namespace).
  4. New-style casts (e.g. dynamic_cast). This is unfortunate, but we can always fall back on the C-type cast and on conversion functions as everyone is doing now.
  5. Hiding of benevolent side effects by the mutuable construct. I'm not sure what this means. I do know that mutuable is the opposite of const.
  6. Exception handing, i.e. the try/catch construct. Again unfortunate, but we should not throw exceptions and use try/catch contructs to catch the exceptions. A work-around is the old-fashioned way of programming it all by yourself, i.e. anticipate every error situation and send appropriate messages.
  7. Templates, e.g. standard template library (STL). To overcome this, try to use base-classes that hold the functionality and sub-classes that 'adapt' the base-class for each type. Thus the base-class functions as a template.

Implicit conversion rules of C++

C++ can convert many types into many other types. This is convenient, but also hazardous:
It's convenient in int i = 3; float f = i + 1.8;
It's hazardous in protocol.send(message, &myObject); when an RTPointer is expected as data. In this case, the &myObject is cast to void* and the send(int, void*) is used instead of the (intended!) send(int, RTDataObject);
I think it's good to write out the implicit casts (i.e. make them explicit), but also know the rules that apply for implicit casts.

Parts of the text below are from

Overload resolution [i.e. which overload of a particular function to use] involves conversions, which may be needed to match a function signature to the types written in a call. [When the types are differ from what is expected, a conversion (cast) must be done. If this cast is not written, C++ will insert an implicit cast according to the following rules of priority]

  1. Trivial conversions. These conversions are often 'unavoidable'; they are needed to meet C and C++ semantics. They do not affect the selection of an overload match, but a given conversoin may be illegal, making the selected match illegal. (T stands for any type.) Any number may be applied in one step.
    T <=>T&
    T[] ==> T*
    T(argtypes)==> (T*)(argtypes)
    T ==> const T
    T ==> volatile T
    T* ==> const T*
    T* ==> volatile T*
  2. Promotions. Promotions or widening, is the conversion from a type with a representation and a width to a type with the same representation but a possible greater width (e.g., short int to long int). Promotions from float to double is allowed, and from an unsigned type to a wider unsigned type. Conversiont between signed and unsigned are not promotions.
  3. Other built-in conversions. These are conversions
  4. User defined conversions. Conversions by constructor or member conversion operator.
  5. A match of a function argument to the ellipsis ( the '...') is worse than all others.
Any number of trivial conversion my be used, but at most one user-defined and one built-in conversion my be applied when C++ is trying to match an argument's type to a function's type in a call.

Built-in conversions of built-in types around built-in operators are the 'usual arithmetic conversions' of ANSI C:

  1. If either operand is long double, the other is converted to long double, and the result is long double.
  2. Otherwise, if either is double, the other is converted to double, and the result is double.
  3. Otherwise, if either is float, the other is converted to float, and the result is float.
  4. Otherwise, chars (signed and unsigned), short ints (signed and unsigned), enumeration values, bools and bitfields are convered to int if int can hold their values, or to unsigned int if int cannot. (These are integral promotions.)
    Then
    1. If either operand is unsigned long, the other is converted to unsigned long, and the result is unsigned long.
    2. Otherwise, if one operand is a long int and the other an unsigned int, they are both converted to long int if long int can hold all the values of an unsigned int; otherwise, they are both converted to unsigned long. The type of the result is the type to which the operands are converted.
    3. Otherwise, if either operand is long, the other is converted to long, and the result is long.
    4. Otherwise, if either operand is unsigned, the other is converted to unsigned, and the result is unsigned.
    5. Otherwise, both operands are int and the result is int.
    These are the ANSI C++ rules. They preserve value first, then signedness. Because they depend on whether a long is wider than an unsigned int, they can give different results on machines with different widths for the built-in types. And there is no guarantee that an integer can be represented exactly in floating point. A 64-bit long (supported on some machines) can take on values that 64-bit doubles cannot.

The matching of arguments

When C++ encounters a function call with N arguments, it considers all visible function overloads of that name. It identifies the overloads eligible to be called with N arguments (including those using default arguments). Then it determines which of the eligible overloads can match the arguments provided, and attempts to find a 'best' match. To be a best match, an overload must

How is one match better than another? Some matches need conversions inserted, others don't. Some conversions are considered closer matches than others. And some matches are allowed only as a last resort.
If an overload's argument can be matched to the actual argument by several conversions, one better than the other, the better one will be taken as the overload's candidate. If there are two or more 'equally good' conversions, and any could be the best, the match is ambiguous and the overload cannot match the actual call.

For example:
int f(float, long);
int f(double, unsigned);
No conversion at all is a better match than promotion, and promotion a better match than a signed-unsigned change.
f(1, 1);  // call: f(int, int)
The choices to match an overload to the call are:
f(int=>float, int=>long)
f(int=>double, int=>unsigned)
The first overload provides a better match on the second argument, and an equally good match on the first argument.
f(0U, 1);  // call: f(unsigned, int)
The first overload provides a better match on the second argument and an equally good match on the first.
f(1.0F, 0);  // call: f(float, int)
The first overload provides better matches on both arguments.
f(1.0, 0);  // call: f(double, int)
The first overload provides a better match on the second argument but the second overload provides a better match on the first argument. The call is ambiguous.

Practical example:

protocol.send() has 2 overloads:
int RTEndPortRef::send(int, const RTDataObject&, int prio = General);
int RTEndPortRef::send(int, void* data = 0, int prio = General);
Which one is used for the following calls?

protocol.send(aMessage);
// ==> protocol.send(aMessage, 0, General);

The second overload is the only one that can be called with 1 parameter.

protocol.send(aMessage, "the data");
// ==> protocol.send(aMessage, (void*)"the data", General);

The first overload requires the second parameter to be converted from const char* to RTString (an RTString constructor (level 4)) and a down-cast to RTDataObject (a trivial cast (level 1), since RTDataObject is a base-class of RTString).
The second overload only requires the second operand to be converted from const char* to void* (a built-in conversion (level3)), so the second is a closer match.

protocol.send(aMessage, 1);
// ==> protocol.send(aMessage, (void*)1, General);

The same as for the string above is valid here, but the const int must be converted to an RTInteger by the RTInteger constructor, which is harder than the cast from const int to void*

New scoping rules for declarations in for()

In C++ it is possible to declare variables in the initialization part of the for(), like in:

for (int i = 0; i < length; i++)
{
  do something;
}
What is the scope if i in this case? In other words: "Where is i known?"

This is a question they have discussed in the C++ world for a while and the outcome is as follows:
i's scope is as if it is declared just before the for().

int i;
for (i = 0; i < length; i++)
{
  do something;
}

This means that i is known after the for() as well, so it is an error to declare it again. E.g.:

for (int i = 0; i < length; i++)
{
  replaceSpace(s[i],'_');
}

for (int i = 0; i < length; i++)
{
  makeCapital(s[i]);
}
This code-fragment is ok if the declaration in the second for() is omitted.

Now the problem...
As I said, these rules took a while to take shape and the compiler we use (g++) is obviously a bit older, because it translates the declaration into a declaration within the body of the for() instead of just before. This means that after the for(), the i is not known anymore and can be declared again (even stronger: it must be declared again if you want to use it). So in g++, the brown piece of code is correct and the correct version will not compile!

This translation is compiler specific and should therefore not be used! I.e.: After the body of a for(), don't make any assumptions about the existence of variables declared in the initialization part of the for().
If you want to be on the safe side: Don't declare variables in the initialization part of the for().

What's true in (result==true)

bool result = ~false;
if (result == true)
{
  printf("This is good\n");
}
else
{
  printf("This is not good\n");
}

What do you think is printed in this exampe? I don't know! It's compiler specific!

Why?
A computer really only works with numbers, so there must be a mapping from true/false to numbers. In C/C++, the definition of true and false is as follows:

This means there is a 1-to-1 translation between 0 and false, but not between a number and true. Therefore, you can compare with false, since false is one unique value. You cannot compare with true, because true is not one single value, it's a range of values.
This is illustrated in the above example. This is what one would think happens:
  1. result is initialized with the opposite of false, which is true.
  2. result (true) is compared with true.
  3. the operands are equal, so the comparison yields true, so "This is good" is printed
But how can the compiler it also interpret? For this, we have to look at the real values:
  1. result is initialized with the one's complement of false (=0), which is 0xFF.
  2. result (0xFF) is compared with true. In the comparison, the concept true is replaced by a real number. Let's say 1.
  3. the operands (0xFF and 1) are not equal, so the comparison yields false, so "This is not good" is printed.
A third option would be that the compiler has overloaded comparison operators for bool, which should make the example work correctly.

What should you do?

  1. Do not compare against true and false directly.
  2. Don't mix numbers with bools.
The following has no problems:
bool result = ~false;
if (result)
  ...
If we want to take numbers into the boolean domain, we should use the comparison operators:
FILE *f = fopen("something","r");
bool success = (f != NULL);

When to use const &

The idea behind a constant reference is the following:

  1. We want to pass a large object as a parameter or a return value, but instead of copying the object and passing it by value, we'd rather pass it by reference i.e. just pass a poiner (a reference in this case) to the object.
  2. A pointer or reference gives you the possibility to access and modify the referenced object. If the intent was a call by value, the caller expects the passed object to remain unchanged. To enforce this, one can use the const.
Concluding, we can say that a reference (or pointer) prevents that a private copy of the object is made. Instead it gives a reference for read and write access to the original object.
To limit the access to just read access, the const is used. So const is counteracting the reference in a way.

Dangers of overloading and overriding

In the BDH OTD model, we are now going to rely more and more on functions and inheriting them instead of separate non-inheriting state machines.
This poses the problem of "Which function will be called?", since several functions may apply (see also "Implicit conversion rules of C++").

Aspects:

  1. Functions may have overloads.
  2. Functions may have overrides.
These aspects have to do with a function's signature.
What is a function's signature?
This consists of 3 parts:
  1. The name of the function.
  2. The types of the input arguments (paramaters).
  3. The const-ness of the function.
This means that the return-value is totally unimportant for the signature.

What is overloading?
If at a particular point in the program, several functions with the same name, but with different parameters and/or const-ness are visible (i.e. they are in scope), those functions are said to be "overloads of each other".
All overloads can be called, because they are discriminated by signature.

Example:

class RTString : public RTDataObject
{
public:
  RTString(const RTString&);
  RTString(const char*);

  char* getContents(void);
  const char* getContents(void) const;
};
  1. The two constructors are overloads of each other, because they have different parameters.
  2. Thw two getContents() functions are also overloads, because the second function is const. This is the const at the end of the declaration (which is called ReadOnly in OTD) and not the const of the return value, because the return value is no part of the signature.

What is overriding?
Overriding is when a sub-class defines a function with exactly the same signature as in the base-class. This sub-class function overrides the base-class function. In effect, the sub-class function masks the base-class function.
By default, only the override is visible and the overridden is not.

Example:

class RTDataObject : public RTObject
{
public:
  virtual RTDataObject* copy(void) const;
  ...
}

class RTString : public RTDataObject
{
public:
  virtual RTDataObject* copy(void) const;
  ...
}
Here we see a function that is overridden in every class. If we normally call copy() in an RTString-method, we get the override and the override is said to mask the overridden function.
(It is possible to call the RTDataObject's copy() directly by calling RTDataObject::copy().)

The danger is that one wants to use override (i.e. substitute base-class behaviour with sub-class behaviour), but one doesn't use the exact same signature, so that an overload is created and not an override. In this case, the function that should be overridden is still visible!

The rules for determining which function to call in an expression are:

  1. Collect all the overloads visible at the expression's place (overridden functions are masked by their override!).
  2. Select the best match for the parameters (see also "Implicit conversion rules of C++").
  3. Select the right const-ness.

Example:

int f(int);  // this is ::f(int), because it is defined in global namespace

class A
{
  int f(int);
  int f(void);
  int f(char) const;
};
class B : public A
{
public:
  int g(void) { return f(); };
}
class C : public B
{
  int f(int size = 0);
  int f(char);
};

void main(void)
{
  int i;
  A a;
  const C c;

  i = f(1);      // see note 1
  i = a.f(2);    // see note 2
  i = c.f();     // see note 3
  i = c.f('a');  // see note 4
  i = c.g();     // see note 5
}
note 1: step 1: ::f(int) is the only function with this name that is visible.
step 2: f(1) matches on f(int).
step 3: const-ness is ok.
note 2: step 1: Visible are: A::f(int), A::f(void) and A::f(char) const (::f(int) is masked).
step 2: f(2) matches on A::f(int).
step 3: const-ness is ok.
note 3: step 1: Visible are: A::f(void), A::f(char) const, C::f(int size = 0) and C::f(char) (A::f(int) is overridden by C::f(int size = 0)).
step 2: f() matches both A::f(void) and C::f(int size = 0). This is an ambiguous call, so the compiler cannot choose.
note 4: step 1: Visible are: A::f(void), A::f(char) const, C::f(int size = 0) and C::f(char).
step 2: f('a') matches on both A::f(char) const and C::f(char).
step 3: because b is a const object, a const function matches better than a non-const, so A::f(char) const is called.
note 5: for g(), B::g(void) is called, that's easy. But now f()...
step 1: f() is called from class B, so only A::f(int), A::f(void) and A::f(char) const are visible.
step 2: f() matches A::f(void).
step 3: const-ness is ok.
Note that class C also knows C::f(int size = 0), which also matches the call, but because the actual call is done from class B, this is also the class where the visibility is determined.
Compare this result with note 3.

Slicing

Every class occupies a certain amount of memory (pointed to by the this pointer). When inheriting, a subclass gets all that the baseclass has, so also the data. So, with single inheritence, the first part of an object's memory is the same as that of its baseclass and the rest is specific for that class, see the example below:

class A
{
  int i;
};
class B : public A
{
  int j;
};
class C : public B
{
  int k;
};
class D : public A
{
  int x;
};
memory-map:


An inheritence relation is sometimes called an "is-a" relation, because class C is-a class B. But if you want to copy a class C, you really want to copy it as a class C, because if you copy it as a class B, only the B stuff (i.e. int i and int j) is copied !! This is called Slicing: You slice-off a part of C and use that.

Below are a few examples where this can happen:

Example 1: Return as baseclass

SomeBaseClass bad(void)
{
  SomeSubClass a;
  // ...
  return a;
}
SomeBaseClass& good(void)
{
  // Cannot return a pointer to a local variable, therefore, make the variable
  // on the heap (or return a member variable).
  SomeSubClass *a = new SomeSubClass;
  // ...
  return *a;
}
Slicing occurs here at the return, where only the baseclass-part is copied and returned. Remember that the lifespan of a local variable ends when the function ends (i.e. just after the return), therefore, a must be copied.

Example 2: Assign to baseclass

SomeBaseClass a;
SomeSubClass b;

a = b;
a's assignment (SomeBaseClass::operator=()) is used and that one only copies the baseclass stuff.
(SomeSubClass)a = b;
It is not legal to cast a baseclass to a subclass.

Probably more appropriate (and at least correct) would be:
SomeBaseClass *a;  // now a pointer!
SomeSubClass b;

a = &b;  // do not really copy
a = new SomeSubClass(b);  // make a new object, by using the copy-constructor

Example 3: Using a baseclass reference

SomeSubClass a, b;
SomeBaseClass &c = a;  // nothing wrong here, just a reference
c = b;  // should copy contents of b into a (both ot type SomeSubClass)
In the last assignment, again the SomeBaseClass::operator=() of c is used, so the object b is sliced. The assignment from a to c is not a problem, since only the reference is copied and the object is not.

A general rule is: When using a baseclass to manipulate a subclass, use pointers or references, so that the data doesn't get copied. If you really want to copy it, use the copy-method (available for RTDataObject) or see to it that the copy is done in the right way (probably, you can add some comments in the code :-) ).

Twice the same method?

As described in Dangers of overloading and overriding,
Class::method() and Class::method() const have different signatures and thus are overloads and not overrides. Therefore, it is possible to define the following:

class Data
{
private:
  int x;
  int y;
public:
  int getX(void) const { return x; }
  int getY(void) const { return y; }
  bool setX(int val) { x = val; return true; }
  bool setY(int val) { y = val; return true; }
};
class DataClass
{
private:
  Data data;
public:
  Data& getData(void) { return data; }
  const Data& getData(void) const { return data; }
};
The two getData methods are the interesting part. It looks a bit overdone, but otherwise, the following code-fragments wouldn't work:
const DataClass data1;
int x = data1.getData().getX();  //everything is const here ==> safe!
data1.getData().setX(5);  //won't work, since result of getData() is const
                          //  and setX() is non-const.
DataClass data2;
data2.getData().setX(5);  //does work: because data2 is non-const, the non-const
                          //  version of getData() is taken, which also
                          //  returns non-const.
Please have a look at this. It may look a bit weird at first, but this is when const shows it's importance: You can return something different in case of const.

There is one thing that I didn't mention and that is why a const object prefers the const overload and a non-const object prefers the non-const overload. This is explained in the next section, Twice the same method? one step further.

Twice the same method? one step further

As explained in the previous section, Twice the same method?, different overloads can be chosen, depending on the const-ness of the object itself. How that is done is the topic of this section.
At the end of this section, two (gcc 2.95.2) errors will be explained:

"passing `const String' as `this' argument of `String::operator char *()' discards qualifiers"
and
"choosing `String::operator char *()' over `String::operator const char *() const'
   for conversion from `String' to `const char *'
   because conversion sequence for the argument is better"

Let's take a look at this example:

class String
{
public:
  String(const char* other = NULL);
  String(const String& other);

  operator char*(void);
  operator const char*(void) const;
};
The advantage here is that the String::operator const char*() may be optimized, because it is only used when the buffer is read-only!

Let's say we have the following code:
      String  string;
const String  const_string;
        char *pstr;
  const char *const_pstr;

pstr = string;              // should call operator char*()
const_pstr = const_string;  // should call operator const char*()
The reason that this works is because the string object is passed as the this pointer to the method as an implicit argument. If you would write this argument explicitly (this is just for explanation, it's not valid C++!!), you would get something like this:

'real' C++ Showing the implicit this pointer
operator char*(void) operator char*(String* this)
operator const char*(void) const operator const char*(const String* this)

Note that the underlined const has moved from behind the function to the this type qualifier, because that is what it means: "Keep the object where this points to constant".
If we now look at the topic on argument matching in Implicit conversion rules of C++, we see that the address of string perfectly matches operator char*(String* this) and the address of const_string matches operator char*(const String* this). Remember that the return value is not important for finding the right overload of a function.
Thus for the assignment statements, we get the following type conversions:
pstr = string: String -> char*
const_pstr = const_string: const String -> const char*

What about the other possible assignments, I hear you ask? They are:

pstr = const_string;
const_pstr = string;
Ok, so far, everything seems pretty logical (remember that implicit conversions and argument matching is one of the hardest things in C++).
But look at the implications of the last statement: If you assign a string to a const pointer, the char* conversion will be used instead of the const char*, which might seems much more logical:
const_pstr = (char*)string;        // good
const_pstr = (const char*)string;  // not good
If you think this looks weird, the gcc 2.95.2 compiler agrees, so it issues a warning for this case:
"choosing `String::operator char *()' over `String::operator const char *() const'
   for conversion from `String' to `const char *'
   because conversion sequence for the argument is better"

(Looking at the above explanation, do you know what the compiler is saying here?)

To solve this, you need to force the compiler. You can go two ways:
  1. Explicitly allow it to use the operator char*() by explicitly specifying the conversion:
    const_pstr = (char*)string;
  2. Force it to use the operator const char*() by 'helping' the compiler to match the arguments. Do this to cast the string to a constant reference to itself. This is done safely by:
    const_pstr = const_cast<const String&>(string);
    This is probably the better option, because it does use the operator const char*() conversion to get the const char*, which is what we wanted after all.
Why did I use that complicated const_cast thing in the last example?
That is because it is safer!
const_cast is only allowed to change the const-ness of an object. It cannot convert from one type to another, even if that type is a subclass (remember that a 'normal' cast can convert a subclass to a baseclass).

Initializers

Constructors can have initializers. This is the part between the parameters list and the body, starting with the ':' (colon).
Here you can initialize members (also const members). E.g.:
class A
{
private:
  const int maxsize;
  int *array;
  int size;
public:
  A(int, int*);
};
A::A(int _size, int* _array)
  : maxsize(100)  //initializer
{
  array = _array;  //could also be done in initializer
  size = min(_size, maxsize);  //could also be done in initializer
}
A::maxsize cannot be set in the constructor-body, because it's const and you cannot assign to a const member. A const can only be initialized once and never assigned to. compare this to normal C:
const int x = 0;  //initialization is allowed
x = 10;  //assignment isn't
A baseclass constructor is also called in the initializer (you can put entire expressions in the initializer):
class B : public A
{
private:
  const char *name;
public:
  B(void);
};
B::B(void)
  : A(50, new int[50]), name("B-class")  //initializer with expressions
{
  //empty body
}

Virtual methods

Virtual methods are the way C++ implements polymorphism. But when is which function called? For that it is convenient to know how it is implemented at a lower level. This helps to get rid of the mystic "The function is determined at runtime".
Before we know which function is called, C++ tries to match its actual arguments to the function signatures that are defined. This is called argument matching and is described in Implicit conversion rules of C++. Also const and non-const functions play a part in that, see Dangers of overloading and overriding and Twice the same method?. And of course, we always have to be aware of Slicing, which can occur if you copy a subclass.

How is a normal function called ?

For calling an ordinary function, you only need the code-pointer where the code is. Let's say we have the following piece of code:

void f(int) {}

f(3);
Then the function-call results in the following assembly code:
push  3       ;put the argument on the stack
call  f       ;jump to the function-code

Conclusion: A normal function is just a piece of code. Because a normal function only describes code, it can only be used to describe algorithms.

How is a method called ?

For a method, it is a bit different, because a method also needs access to the object on which' behalf it's called. For this, it needs an implicit, hidden, extra argument: the this-pointer.
Let's say we have the following piece of code:

class A
{
private:
  int a;
public:
  void f(int i) { a = i; }
};

A a;
a.f(3);
This method-call results in the following assembly code:
push  &a      ;arg1: a pointer to the calling object, the this-pointer
push  3       ;arg2: the explicit argument
call  f       ;jump to the method code
And this is how the function is translated into assembly code:
mov  *(arg1 + 1), arg2
return
arg1 is the implicit this-pointer. 1 is the offset of the first member variable, a. This instruction thus moves the value of arg2 (3) into where the variable a is in the calling object.

Conclusion: A method is a function that has access to the data of the object in which' behalf it's called. Because a member function is associated with data, it can be used for a variety of things: algorithms (with or without state), access to member variables.

How is a virtual method called ?

If we now have the case of virtual methods, the called method is only known at run-time. See for example the next piece of code:

class A
{
private:
  int a;
public:
  virtual void f(int);
  virtual void g(int);
};

class B : public A
{
public:
  virtual void f(int);  //this is an override
  virtual void h(int);
};

A a;       //object of type A
B b;       //object of type B
A *pa;     //pointer to object of type A (its static type)

pa = &a;   //pointer is now pointing to object of type A (its current dynamic type)
pa->f(3);  //call f(3) on behalf of object a to which pa is pointing

pa = &b;   //pointer is now pointing to object of type B (its current dynamic type)
pa->f(3);  //call f(3) on behalf of object b to which pa is pointing
First pa is made to point to object a and the function A::f(int) is called, because pa is pointing to an object of type A. Later, pa is made to point to object b and the function B::f(int) is called, because pa is pointing to an object of type B.
How is this accompliced?
This is done by means of a Virtual Method Table (VMT). This VMT is a table that is defined with each class that has virtual methods.
The last four statements are translated into the following assembly code (Note that the compiler knows that f is a virtual method, so the generated assembly code is different from the code for a non-virtual method-call):
mov   pa, &a        ;pa is now pointing to object a
push  pa            ;arg1: a pointer to the calling object, the this-pointer
push  3             ;arg2: the explicit argument
mov   VMT, *(pa+0)  ;VMT is the pointer to the Virtual Method Table
call  *(VMT+0)      ;jump to the method code A::f(int)

mov   pa, &b        ;pa is now pointing to object b
push  pa            ;arg1: a pointer to the calling object, the this-pointer
push  3             ;arg2: the explicit argument
mov   VMT, *(pa+0)  ;VMT is the pointer to the Virtual Method Table
call  *(VMT+0)      ;jump to the method code B::f(int)
Explanation:
pa is the this-pointer of the object for which we want to call a method. As said before, every object, that has virtual methods, has a pointer in its data that points to the VMT of the class. This pointer is stored as the first field in the object's data (i.e. the data to which the this-pointer is pointing). Therefore, *(pa+0) is the value of this pointer to the VMT. See also the figure below.
This VMT is used to get a pointer to the function that should be called.
Note that the code to call A::f(int) and B::f(int) is the same! The only difference is that a different VMT is retrieved, because pa is pointing to a different object. Therefore, the actual function called is determined by the object pa is pointing to at run-time !



About the figure: The rectangular boxes are associated with classes and the rounded boxes are associated with objects. The VMTs contain pointers to the defined virtual methods for that class. VMTA points to A::f(int) and A::g(int). VMTB points to B::f(int), because this is an override, B::h(int), which is a new method and also A::g(int), because it is not overridden and inherited from class A.
a and b are instances of the classes A and B and thus contain both an instance of member variable a. They also contain a pointer to the proper VMT.

Conclusion: A virtual method can be used to separate the interface and the implementation: the interface is defined by the base-class (this method can even be abstract) and the implementation is defined by the subclass. The consequence is that the implementation can be different for every subclass.
Why would we want this?
One reason is because we want to have different implementations for an algorithm (called the Strategy Pattern). Another reason is because the algorithm needs information that only the subclass has (this is called the Template Method).

This VMT-pointer is basically what is known as the dynamic type and Run-Time Type Information (RTTI) can also be based on it: If we have an object, we can compare its VMT-pointer to all the known VMTs and the one that matches, is of the current dynamic type.

Other languages

C++ uses the keyword virtual to declare both a virtual function and its overrides in sub-classes. Non-virtual functions are the default and don't require special keywords.
Java only knows virtual functions (as fas as I know), so all functions are virtual and don't require any special keywords.
C# has some more keywords. Like in C++, the default is non-virtual. To declare a function as virtual, use the keyword virtual. To declare an override for a virtual function, use the keyword override. To declare an override for a non-virtual function, use the keyword new (which is unrelated to the memory allocation operator).


Last Updated: July 22, 2008