C++ Rvalues, Move Semantics, and Copy Elision.

After writing my first article on C++ on passing arguments by value, pointer and reference, I received many requests to write about rvalues, move semantics, and copy elision. These are most likely new terms to people who use other programming languages and even long time C++ programmers may have never heard of these terms before. This article will explain what those terms mean with some simple and straightforward examples.

Rvalues

Rvalues are temporary values. Rvalues called so because they are generally found on the right side of an assignment. They can be assigned to other variables, but they cannot be recipients of an assignment. Examples include non-string literals and function calls. In the following example, 1 and foo() are rvalues. They are used to initialize variables a and b. a and b are called lvalues (‘left’ values).

int a = 1;
int b = foo();

The following statements cannot compile. 1 and foo() are temporaries. Assigning a value to them is not possible and has no semantic meaning.

1 = c; // error
foo() = d; // error

How do you tell if a value is a temporary? As a general rule of thumb, a value is a temporary if it does not have an identifier or name. So if a value is captured by a variable, it is not an rvalue. For example,

int bar = 1; // bar is a name, so it is an lvalue.
2; // 2 is an integer literal, it has no name (no identifier). It is
   // an rvalue.

Another test you can do is use the address-of operator (&). It is not possible to take the address of rvalues.

int a = 1;
&a; // Works, a is a stack-allocated variable.
&1; // Error, 1 is an integer literal.
&foo(); // Error, cannot take address of the temporary result of a 
        // function call.

Rvalues are an important addition to the C++ language because they allow something called a move.

Move Semantics

Let’s say there is a big object and its data is being transferred to another object of the same type. In C++ and many other programming languages, this could be accomplished with an expensive copy.

SomeBigObject obj1;
SomeBigObject obj2 = obj1; // Expensive copy.

C++11 introduced the concept of a move to make this operation more efficient. A move allows one object to “take over” or “steal” another object’s data.

SomeBigObject obj1;
SomeBigObject obj2 = std::move(obj1); // Efficient move.

Now obj2 is the same as what obj1 was without copying any data, leading to a very efficient initialization of obj2. The catch is that obj1 is now in an invalid state because its data members were just stolen. For example:

struct SomeBigObject {
  // Constructor and other methods.
  ...

  // Copy constructor.
  SomeBigObject(const SomeBigObject& other) {
    *foo = *other.foo;
  }

  // Move constructor.
  SomeBigObject(SomeBigObject&& other) {
    foo = other.foo;
    other.foo = nullptr;
  }

  // Data members. Foo is allocated on the heap and could be large.
  Foo* foo;
}

When an object of type SomeBigObject is initialized by moving another object of the same type, the move constructor is invoked. The move constructor steals the Foo pointer from the source object and sets the source object’s Foo pointer to nullptr. This is an O(1) operation and leaves the source object in an invalid state with foo as a nullptr. other’s foo object was ‘stolen’.

This is different from copying another object of the same type, which invokes the copy constructor. In the copy constructor, the source object’s entire Foo member is copied into the target object’s foo member variable, which could be a very expensive copy.

Leaving the source object in an invalid state is undesirable in certain cases, but it is not a problem if the source object is a temporary rvalue. Moving an rvalue can be very useful. Some other types, such as std::unique_ptrs, only allow moves.

Copy Elision

Let’s say there is a big object being returned from a builder function.

SomeBigObject BuildBigObject(){
  SomeBigObject foo;
  // Initialize foo...
  return foo;
}

SomeBigObject bar = BuildBigObject();

For bar to be initialized, the function-scoped foo has to be copied from the function’s stack frame into the outer-scoped bar variable, an expensive copy. What if there was a way for bar to just become the function-scoped foo (i.e. whenever foo is used, just use bar instead?)? This optimization is called copy elision and is similar, but not exactly the same, as a move.

Remember that the return value of BuildBigObject is a temporary rvalue. It will disappear when BuildBigObject’s stack frame is popped. The compiler can detect that foo is a temporary and instead of discarding it, it will make bar and foo refer to the same memory location in the invocation of BuildBigObject. That means that when BuildBigOjbect is initializing foo, it is actually doing work on the outer-scoped bar. This magic is called Named Return Value Optimization (NRVO) which is a specific application of copy elision. More on copy elision, RVO, and NRVO here.

Breaking down what actually happens, the program first allocates memory for bar and passes bar’s pointer to BuildBigObject which uses that instead of foo. Now, whenever BuildBigObject refers to foo, it is actually de-referencing bar. This is logically equivalent to a move, but the intricate details that the compiler performs under the hood makes it not exactly the same.

There are a few rules that BuildBigObject has to follow for Copy Elision to occur, which are outside the scope of this article. A simple builder function without any branching like the one shown above should suffice for copy elision.