Friday 23 July 2010

A reminder about boxing and unboxing in C#

Here is a quick reminder about boxing and unboxing in C#. MSDN documentation contains this concise description of boxing and unboxing:

“Boxing is the process of converting a value type to the type object or to any interface type implemented by this value type. When the CLR boxes a value type, it wraps the value inside a System.Object and stores it on the managed heap. Unboxing extracts the value type from the object. In the following example, the integer variable i is boxed and assigned to object o.” *

All primitive types (byte, char, int, long, etc.) and structures are value-based types derived from System.ValueType. Because value types are stored on the Stack they can be created and accessed more efficiently than reference types. Reference types are stored on the Heap and must be accessed using references.

An example of boxing and unboxing:

int i = 123;
object o = (object)i; // boxing
int j = (int)o; // unboxing

Why is this important?

Boxing and unboxing operations are computationally expensive in relation to other simple assignments (for example, when a value type is boxed, a new object must be allocated and constructed). If you have code which undertakes a lot of boxing operations you may run into performance issues.

Determining which methods and properties will cause boxing is possible by checking their their signatures. If a method takes an argument of type object or the argument is an interface then if you were to pass in a value type instance it will be boxed.

Note it is very easy to slip into boxing operations:

var sb = new StringBuilder();
for(int i = 0; i < 10; i++)
{
    sb.AppendFormat("Boxing: {0}, {1}", i, i + 1); // Boxing here   
}

You could get around this by calling the ToString() method early to prevent allocation on the heap:

var sb = new StringBuilder();
for(int i = 0; i < 10; i++)
{
    sb.AppendFormat("No boxing: {0}, {1}", i.ToString(), (i + 1).ToString()); // No boxing   
}

Value types and reference types

Again MSDN documentation provides a description of value types:

“Variables that are based on value types directly contain a values. Assigning one value type variable to another copies the contained value. This differs from the assignment of reference type variables, which copies a reference to the object but not the object itself.” **

There are 2 main categories of value types:

  • Structs (e.g. numeric types: integral types, floating-point types and decimal; bool; user defined structs)
  • Enumerations

In essence, value types are lightweight objects that are allocated on the current thread's stack (see Stack and heap below). There are exceptions to this (e.g. when a value type is allocated as an element of an array or is a field of a reference type). When an instance of a value-type is created a single space in memory is allocated to store the value. When instantiating a reference type an object is created in memory and is handled through a reference (somewhat like a pointer).

Stack and heap

The Stack can be thought of as being responsible for keeping track of what's executing in the code (the call stack). When a method is called (a Frame) data is pushed onto the Stack. Once the method executes the data is popped off the Stack and is discarded. Only the ‘top’ item is available in the Stack (typical behaviour for a stack being a last-in-first-out memory structure).

The Heap can be thought of as being responsible for keeping track of the objects referenced by the code. Any element in the Heap can be accessed directly.

Garbage collection

The Stack does not require garbage collection. It is self-maintaining because items are popped off the stack and discarded as they are used during program execution. On the other hand the Heap is subject to garbage collection.

* Boxing and Unboxing (C# Programming Guide)
** Value Types (C# Reference)

Friday 23 July 2010